This is part 2 of a series exploring the current state of variable data printing. Part 1, Insights on VDP
By Nicholas Barzelay
The key feature of a desktop database is the capability to see both the structure of the data and the data itself. This view is like a spreadsheet, but provides processing functions similar to a more robust DBMS (including use of SQL – Structured Query Language). Beyond data visualization, other key features are point and click and drag and drop functionality.
Desktop databases can hold considerable amounts of data (over a terabyte) spread across multiple storage disks. They also include the functionality to programmatically manipulate large amounts of data. Data manipulation includes the ability to manipulate data structures, rework data content, and transform data (via exports) into simple file formats, spreadsheets, or XML (extensible markup language).
While there is a potential performance downside in using a desktop database, the work production rate remains favorable. Any potential production impact is substantially counterbalanced by advantages from working in a visual environment. This makes it very adequate for preparing VDP data streams. Such preparation tasks include data cleansing, data content adaptation, and near-ready file exports for XML workflows.
The development of data queries using “query-by-example”, and a WYSIWYG (what you see is what you get) interactive work process, is considerably easier than developing a query programmatically by trial and error. Such queries, built with query-by-example, resolve to SQL queries. It really means developing SQL statements using point and click, drag and drop, menus, and pre-built templates.
Such capabilities provide a good basis for developing database expertise in an organization, while at the same time producing results needed for successful VDP development to meet customer requirements.
Necessity speaks to the need for clean data prior to generating variable documents – particularly customer communications. The old adage, “garbage in garbage out” applies. The usual suspects are misspellings and duplications (same person but corrupted spelling). Some are easier to repair than others, and a few may slip through. In addition, there may be other errors such as incorrect capitalization (all upper case, all lower case, or mixed case in the wrong places). These are but a few of the problems. They usually relate to improperly input or acquired data.
Once datasets have been cleansed, they need to be maintained for repurposing. It makes no economic sense to repeat completed work. Maintenance should include comparing datasets for matching data and correcting any variations where matches are found. Such corrections can be applied from the datasets back into the originating databases, if available for correction. Clean data is always desirable. However, cleansing is not the only data preparation needed.
Adapting a dataset to a project and a given set of design requirements is a further step. The reason for this has to do with how the data elements for each record are used in the document design.
One common requirement (especially for XML-based data) is that the data elements for a record are placed into a layout in a sequential order set by the XML. Data element order in the dataset must match data element order in the design to facilitate this.
A second requirement (also related particularly to XML-based data) is that within a record, a data element may be used only once. For instance, if a record has address data (first name, last name, etc.) and the design calls for using the first name in two additional places, then two more first name data elements need to be added to each record to satisfy the design requirement.
Additional data elements may need to be included to indicate decision flags for sorting, selection, or applying a snippet of business logic. References to digital images stored outside the database in specific file locations (URI’s – uniform resource identifiers) will require adding more data elements at the record level.
Therefore, a considerable amount of structural data work, over and beyond cleansing and formatting, may be needed to facilitate successful generation of the individual variable documents. Once the data (records and fields) have been established in the desktop database, they can be exported into an XML data stream. This approach works very nicely with a tool such as Adobe InDesign.
Much of the preparatory data work needed for producing VDP documents can be easily accomplished in a desktop database. Considering that data preparation for a VDP solution may well be the most important effort in a project and take a significant amount of time, making data and content manipulation as simple and easy as possible is an advantage that will pay dividends in achieving quality results.
Additionally, addressing the complexities of content preparation up front will go a long way toward simplifying the entire workflow. Many issues can be resolved in the database, making it easier to integrate the content stream into the document design. A data preparation tool that can provide a level of visualization similar to the document design tool is a necessity. Using a desktop DBMS will provide this benefit.
Editor’s Note Nicholas Barzelay is a recent graduate of RIT School of Print Media’s graduate program. Mr. Barzelay’s research area was variable data printing. He is the co-author of Upstream Database and Digital Asset Management in Variable Data Printing (Executive Summary available here). Mr. Barzelay is working on two books on the subjects of data management and Variable Data Printing. He is sharing some of his work to get industry feedback.