XML Processing for VDP

By | July 28, 2009

By Nicholas Barzelay

Simple File Structure

A file is composed of records, which may or may not beuniquely identified with a record key.  Records may also be identified by a physical mechanism in the filecalled an inter-record gap (IRG).  A program reads a record until it comes to a new key or the IRG.  At the new key, it logically(programmatically) comprehends that it is now reading a new record.  At the IRG, the program spaces over theIt and logically concludes that it is reading a new record.

Records are composed of data elements or fields.  The fields are in a set sequence andcontain text or numeric data – for instance, address, name, or quantity fields.  A program may also know when it comesto a new record when the field sequence starts over – it has againencountered the first field in the sequence.


Data elements in XML are identified by start and end tags,which reflect the content of the element, for instance “<last_name>Smithers</last_name>”.  An XML file of repeating data isessentially a set of records where the tagged elements occur in a set sequence.  Therefore any time a program reads thefirst tagged element in a sequence, it knows that it has started a new XMLgroup – a new record for all practical purposes.

Since XML is hierarchical, there may be a repeating tagset (start and end tags) that serves as a container for a set of tagged dataelements – essentially a record container.  The tag name of the container for a sequence of XML dataelements, for example, may conceivably be “<record>” as the start tag and“</record>” as the end tag.


Consequently, simple file processing (the sequential readof data elements within a sequential set of records) is feasible for XML files,based on the use of tag names as record and data identifiers.  The XML file can be processedprocedurally with simple reads and without the nuanced processing associatedwith XML object model processing.

Content and structural changes can be made to very largefiles of repeating XML data very quickly using a scripting language such asPerl or Ruby.  This gives the VDP developeradditional data handling alternatives, if true XML support in the form of XSLT (extensiblestyle sheet language transforms) is not available.

As another processing option when serious programmaticsupport is limited and where data changes are repetitive across the entire dataset, use of simple “find and replace” logic common in most text editor and wordprocessor applications can be effective.


Many applications and technologies today containconsiderably more functionality than what is required for common use.  It is not necessary to know how to useevery capability a technology offers – only the ones needed to get thejob done.

There is usually more than one way to get the jobdone.  For instance, setting up anXML data stream can be done using an XML capability (XSLT), processing the XMLas XML using a program, processing the XML as a text file with a program, orprocessing the XML as a text file using a word processor’s “search and replace”functions.

From the perspective of skill sets, this means that thereis considerable flexibility in finding a workable productive combination oftools and resident  (or potentiallyresident) skills that can efficiently and effectively do the needed work.

Once again, what facilitates such flexibility is a fundamentalset of tools and workflow that allows iterative, recursive, and retrogressive tasksduring the design and development steps of the VDP document prior to sendingthe generated job stream to press.

Share this post


4 thoughts on “XML Processing for VDP

  1. Todd Chronister

    Shouldn’t this entire blog post be the first paragraph? I was reading on hoping to gain some practical knowledge, but it is common. For those who find this to be insightful will have no understanding of how to implement something like Ruby to meet their ends. I do hope you plan to continue this topic in something like a ‘part 2’. This reminds me of someone I met recently at the GUA in Orlando who told me he was using PHP to generate his impositions for press. Sounds logical doesn’t it, until you really get to it and all of a sudden the frame widens and the ROI to actually build, test, revise, proof, and launch is an expensive excersize of reinventing the wheel.

  2. Eliot Harper

    XML provides a level of flexibility, but I’m not convinced it’s suitable as a VDP data source. While it might be common practice to use a delimited text files for VDP, these delimited files are easy to view and edit as tabular text (i.e. open the in Excel) and only contain field names in the first line. Yes, I know you can open an XML file in Excel, but remember that XML contains a tag for every single field in each record, which when you have thousands of records, produces an unnecessarily long text file to parse.

    If you really want to use flat text file data source, then I’d recommend sticking with good old delimited text. And if you really need to include relationships in your data source, then use a database connection to a relational database!

    I’m not saying XML doesn’t have a place in VDP, it absolutely does–as a document format (like XMPie XLIM or XSL-FO used by Scriptura). And XML isn’t really a suitable format for VDP output (i.e. PPML), but I’ll save that discussion for another thread…

  3. Nick Barzelay

    Let me net it out. All this post says is that the combination of XML-tagged record-organized repeating data and a text-oriented programming language like Perl or Ruby is an excellent way to prepare a data stream (including digital asset URIs) for generating VDP documents with Adobe InDesign. I am over half way along in writing a book that addresses this and related technical topics in considerable detail.

    Regarding XMPie, that VDP application (according to a trainer from XMPie) uses a scaled-down version of the Adobe InDesign Server Engine. If you follow their process, they convert data to XML before feeding it into the InDesign engine. If there is a problem with the data, one has to revert to the raw data to make the correction and then convert to XML again in order to run the corrected data stream. My point is that the XML data stream can be managed directly without going back to the raw data.

    The other thing that I’ve found (and this has been confirmed to me by others in print and graphic design) is that the XML automation capabilities of Adobe InDesign (as far back as CS2) tend to be misunderstood, unexplored, or unknown. The common thinking is that one has to have a plug-in in order to use InDesign to create VDP, otherwise it is just a layout application.

  4. Michael J


    You say “XML provides a level of flexibility, but I’m not convinced it’s suitable as a VDP data source.” Your point would be more reasonable if it included “suitable for X”. If X= printers, I tend to agree. But not because of the capabilities of XML. It’s because of the natural experience and skills of printing craftspeople.

    Back in the day I worked on a project for Grow Network. All XML all the time. The company produced among other things, 1000’s of customized workbooks for students in Texas. The content was based on what they got wrong on the standardized tests. The delivery of personalized review material resulted in about a 15% increase in students passing the retest.

    It was all XML. All open source.The page layout was done with XFlo (I think. I’m not a software geek. Just a printing advisor. )At the time it also took the dedicated efforts of some of the best software engineers I’ve ever met.

    In my not so humble opinion, for most printers most of the time, mail merge on steroids are plenty good enough. If you, a printer, want to do more, instead of buying tools I would recommend having the tech smart skills on staff or spending the money to partner with an outfit that is expert at all the stuff you have to be expert in to make this practical and margin producing.

Comments are closed.