By Nicholas Barzelay
Simple File Structure
A file is composed of records, which may or may not beuniquely identified with a record key. Records may also be identified by a physical mechanism in the filecalled an inter-record gap (IRG). A program reads a record until it comes to a new key or the IRG. At the new key, it logically(programmatically) comprehends that it is now reading a new record. At the IRG, the program spaces over theIt and logically concludes that it is reading a new record.
Records are composed of data elements or fields. The fields are in a set sequence andcontain text or numeric data – for instance, address, name, or quantity fields. A program may also know when it comesto a new record when the field sequence starts over – it has againencountered the first field in the sequence.
Data elements in XML are identified by start and end tags,which reflect the content of the element, for instance “<last_name>Smithers</last_name>”. An XML file of repeating data isessentially a set of records where the tagged elements occur in a set sequence. Therefore any time a program reads thefirst tagged element in a sequence, it knows that it has started a new XMLgroup – a new record for all practical purposes.
Since XML is hierarchical, there may be a repeating tagset (start and end tags) that serves as a container for a set of tagged dataelements – essentially a record container. The tag name of the container for a sequence of XML dataelements, for example, may conceivably be “<record>” as the start tag and“</record>” as the end tag.
Consequently, simple file processing (the sequential readof data elements within a sequential set of records) is feasible for XML files,based on the use of tag names as record and data identifiers. The XML file can be processedprocedurally with simple reads and without the nuanced processing associatedwith XML object model processing.
Content and structural changes can be made to very largefiles of repeating XML data very quickly using a scripting language such asPerl or Ruby. This gives the VDP developeradditional data handling alternatives, if true XML support in the form of XSLT (extensiblestyle sheet language transforms) is not available.
As another processing option when serious programmaticsupport is limited and where data changes are repetitive across the entire dataset, use of simple “find and replace” logic common in most text editor and wordprocessor applications can be effective.
Many applications and technologies today containconsiderably more functionality than what is required for common use. It is not necessary to know how to useevery capability a technology offers – only the ones needed to get thejob done.
There is usually more than one way to get the jobdone. For instance, setting up anXML data stream can be done using an XML capability (XSLT), processing the XMLas XML using a program, processing the XML as a text file with a program, orprocessing the XML as a text file using a word processor’s “search and replace”functions.
From the perspective of skill sets, this means that thereis considerable flexibility in finding a workable productive combination oftools and resident (or potentiallyresident) skills that can efficiently and effectively do the needed work.
Once again, what facilitates such flexibility is a fundamentalset of tools and workflow that allows iterative, recursive, and retrogressive tasksduring the design and development steps of the VDP document prior to sendingthe generated job stream to press.