Programming Languages for VDP

By | August 19, 2009

By Nicholas Barzelay

The question, “What is the best programming language for VDP?” seems to be a persistently unanswered issue for those developing or contemplating VDP development. There are very many programming languages these days. Some are more suitable for handling textual and numeric data than others. This is one consideration, but beyond functionality, execution method may be a more relevant first step in making a selection.

Language Execution Method

For purposes here, there are two types of languages: compiled languages and interpretive languages. In a compiled language the program code is checked for errors and then converted into machine-readable instructions using a software application called a compiler. The compilation takes place once, and programs are only recompiled if later modified. This produces very fast programs that are very stable – stable in the sense that they are not intended to change with every run. Such languages are very good for developing application frameworks, graphical user interfaces (GUI’s), function libraries, and standard (shrink-wrapped) application software packages.

Programs written in an interpreted language are converted into machine-executable instructions as the interpreter reads the written program. This means that every time the program is run, it is interpreted (essentially the same as being compiled at runtime). Therefore, interpreted programs run slightly slower than compiled programs. However, interpreted programs are very flexible because they can be modified and run on the fly. This makes them ideal for VDP jobs, because each processing run is liable to be different. Program logic that will be reused with variations can be retained as programming templates or sets of functions that can be customized prior to running.

Rationale

Most VDP processing will involve data file or XML manipulation, as opposed to creating fixed logic applications, GUIs, or code libraries. Therefore programming languages like C, C++, Objective-C, Java, and C# are really not needed for everyday VDP development. It is unlikely that a printer or ad agency, for example, is going to be building application software packages. Java script, VB (Visual Basic) script, and VB have their own functional or platform limitations.

Because every VDP job is liable to use different sets of data or require different processing, a high degree of flexibility is needed. Interpreted languages such as Perl or Ruby are more appropriate selections. Both are very quick at runtime, and extremely well suited for text processing (the primary need when manipulating data or raw text for VDP).

Training is relatively straightforward for both Perl and Ruby. A strong open source developer community supports both. Documentation and tutorials are plentiful. And both languages are cross-platform capable, meaning for example, the same program can run on Mac OS X, PC Windows, Linux, or Unix.

Share this post

 

9 thoughts on “Programming Languages for VDP

  1. Michael J

    Interesting post. Especially if taken to give printers a conversational knowledge of what’s going on behind the curtain. The problem is if printers see this as something to put on a to-do list.

    “Training is relatively straightforward for both Perl and Ruby.” It’s possible to get a conversational knowledge of the issues and the general outlines easily and quickly. But if VDP is reframed as database publishing and the software is reframed as manipulating and analyzing a database, it becomes pretty clear, pretty fast that this is a job for experts for complex applications.

    The good news for printers is that there are many ways to solve the problem for much less complex projects. I noted with interest that Mimeo has built their success without having any VDP capability until just now. For most printers, for most jobs mail merge is fine. For a little more sophisticed it might need MailShop. To deliver the heavy duty analytics that are the really high margin deliverable, why not use Mindfire?

    To be clear, I’m retired and have no interest in any of these companies.

  2. George Alexander

    I agree with Nicholas’ approach here, but it is important to set the context a bit. I would state it this way: for the mix of day-to-day tasks that a VDP programmer must perform, Perl and Ruby are good choices. (I’m partial to Perl personally. I hear good things about Ruby, but haven’t tired it yet.)

    When setting up a VDP job, printers have to accept whatever files their customers can provide. VDP programmers are often faced with simple, one-off tasks based on the special needs of a particular file they have been given. For example, a name-and-address file could contain a mix of three-line and four-line address records, with a mixture of 5-digit and 9-digit zip codes. It might be necessary to find the last line of each record and extract a five-digit zip, in order to sort the list by zip code. Ordinary mail-merge programs can’t deal with data issues like those, so a program has to be written—just a few lines of code, to be sure, but a program all the same.

    For a job like that, it is possible to use almost any programming language. Some people would use the scripting language in Excel, especially if the incoming file is an Excel file. Similarly, the programming capabilities within MS Access could be used. And of course general-purpose languages like C++ and Java could be used. But a simpler language with good text-processing routines built in, especially an interpreted one like Perl or Ruby, is often the most efficient tool. And as the tasks become just a bit more complex than the one outlined above, Perl and Ruby really start to shine.

    Just to be clear: Nicholas’ advice is aimed at people who have some programming experience but are not VDP veterans. It would be smart for them to look at Perl and Ruby as their main tools. Michael J’s comment raises the concern that non-programmers might get the idea that learning Perl or Ruby will be easy. It won’t. Even though Perl is a lot easier to get started with than C++, and even though a Perl program will often need only one line of code for every five in C++, it’s still programming. If you’ve never done it, it will be hard.

    It occurs to me that there is a book waiting to be written, telling how to apply Perl or Ruby to common VDP tasks. Nicolas, are you the person to write it?

  3. Paul O'Brien

    It’s amazing to me that one of the most powerful VDP solutions available at an incredibly cheap implementation cost is never mentioned: XSL with XSL:FO. With tools like Stylus Studio or Altova MapForce one can easily build transforms for just about any datastream – fixed field, delimited or XML – that can transform the data into a common XML schema for use with a VDP work flow – virtually no programming necessary. Companies like RenderX and Ecrion have very good GUI based tools for designing your printed pieces that can then ingest this datastream and create the necessary XSL:FO datastream that then can be rendered to PDF, PostScript, AFP, etc. by RenderX, Ecrion or Antenna House engines – who’s cost is in the low thousands – a small fraction of what some of the more common VDP rendering engines such as Pageflex, PReS or Adobe InDesign Server run.

  4. Michael J

    Paul,

    Just to underline your point. Back in the day, around 1998, I was the printing adviser to a dot.com that created completely personalized workbooks and websites for high school students. The dot.com was later purchased by McGraw-Hill.

    The workflow was XML via XSL and XSL-FLO to print. At the time, it needed very skilled experienced people to get it done. It strikes me a bit like color separations. First CEPS, then desktop + CEPS, then desktop, then shrink wrapped software.

  5. George Alexander

    Paul, I am intrigued by your approach. I know it is possible to do many VDP projects the way you suggest, and I recognize that the tools are available and inexpensive. But how easy will it be for the average VDP programmer to adopt this approach? Doesn’t it require some pretty substantial re-training?

    I’m thinking about hurdles like these:
    1. You have to devise an XML schema that will work for a given project. I suspect not too many people actually know how to do that.
    2. You have to get your data into that schema. Is it as easy as you imply?
    3. You have to figure out how to use the resulting XML file in conjunction with one of the XSL:FO formatting tools. Have these evolved to the point that it is relatively straightforward to come up with a page design that exactly matches an InDesign layout we have been given? Can we work with these tools if we don’t know exactly what a “flow object” is?

    In my (very limited) experience, these tools seem to be understood only by techies with deep XML background. I haven’t encountered books that explain them for the Quark/InDesign crowd, nor have I seen these vendors exhibiting at printing industry shows. How would a VDP programmer without your background get started in this direction?

  6. Michael J

    George,
    On the website at MapForce they have a 30 day free download. Might be interesting to give it run…http://www.altova.com/mapforce.html

    FYI: here’s what the site says:
    To use MapForce, simply open data sources and targets as mapping components, drop in data processing functions from the customizable libraries, and then drag connecting lines between the nodes you wish to associate. Mapping output is created in real-time. For XML and database mappings you can even view and save the XSLT 1.0/2.0 or XQuery execution code. Or, with just a click of the mouse, you can choose between Java, C++, or C# to automatically generate a turnkey application from your design. This way, you can implement data integration and Web services applications without writing any code. MapForce-generated code is royalty-free, so you may deploy it without additional fees or deployment adaptors. Features and functionality in the MapForce data mapping tool include:

  7. Paul O'Brien

    Hi George,

    There will definitely be some retraining – I think the difference is that instead of learning a propriety language the skill set is mostly XML, XSL and XPath and possibly XQuery. These technologies are becoming more and more applicable to many data processing applications. A good example is an update I need to do. I get a csv file of tax data – I transform this to an xml file and then apply a xsl transform to transform the data into SQL statements that are run to update the database – all done with a fairly simple vbscript and 2 small xsl documents. If you have a schema for your xml documents – it takes about 4 or 5 lines of vbscript to “validate” the document – which can be very precise based on how well your schema is designed – for example I use a regular expression to check a specific product id element that must match an exact format of characters and numbers.

    With tools like Stylus Studio, creating a schema is actually pretty simple. I build a representative XML file with all of the possible elements in it that I want and then use their “Create Schema from XML” tool that builds the xsd file for you. I normally go back and tweak a few things but it gets you about 90-95% of the way there.

    Once you have a “source” schema and a “target” schema – building an XSL transform is as easy as drawing lines from elements in one schema to elements in the other. You typically specify an example document with real data in it and as you go along, you can at any time run the transform to see the results based on what you’ve done so far.

    The better XSL:FO design tools are very powerful and let you literally drag and drop elements and segments into the page. They allow you to create documents with static pages or documents – such as bills – whose data may span many pages. Again, normally these tools will get you pretty far along allowing you to drop into “coding” mode to make minor tweaks.

    XSL:FO itself allows for very precise positioning of information on a page. We have developed a schema to represent index tab information and a corresponding XSL:FO transformation that takes a “tabDoc” and creates printable PDFs for our index tab operation – the copy must be very precisely positioned. We have a desktop tab design application and are working on a web based version that create the “tabDoc” – but we could just as easily take a formatted data source – say an excel file – build a transform that would turn it into a “tabDoc” – and drop it right into the same XSL:FO workflow.

    As I said in my first post, I’m continually amazed at how under exposed this technology is. My only guess is that since it’s really kind of an “open source” type solution that there aren’t any big companies out there pushing it. Adobe would much rather sell you an InDesign Server at $25K then let you know that you can create an equivalent solution for around $3-4K.

    As far as getting started – I’d probably by Stylus Studio and start working with XML, schemas (xsd files), XSL and XPATH to learn the technology – it’s MUCH EASIER than learning a programming language. Next I would be to play around with building your own XSL:FO simple transforms – maybe create a transform that would take contact information and create PDFs of business cards. You can get trial versions of RenderX or Antenna House renderers once you get to this stage and play around for a month or so.

    The other piece to this puzzle is creating workflows. If you haven’t heard of PowerSwitch – get someone at Enfocus to give you a demo – WOW – this pulls it all together. We’re currently working on a flow where we get a fixed format billing file – we build a transform with Stylus Studio to turn it into a XML file – which we validate against a schema for data integrity – we then “normalize” this data to a common mailing schema for pre-sorting – pre-sort it and then transform it to XSL:FO for output on our digimasters. Other than a little bit of vbscripting there is virtually no traditional programming involved.

  8. David Smith

    Most excellent discourse Gents, I am certainly going to explore Stylus Studio and Mapforce. One indispensable tool for me over the last 19 years has been Filemaker professional – I have been producing high quality versioned output and variable content products since the first data aware plug-ins for Quark were released in 1989. Filemaker Pro Developer edition (Now called Advanced) is $499 and is relational, completely cross platform, is XML aware, SQL compatible and capable of producing high quality pdf output on its own, I have even used it for successful cross-media campaigns. When used in a workflow, it is easily scripted to pick up data, perform transformations and output resultant files directly into a hot folder or event triggered workflow. It also can produce interactive web sites through its instant publishing method or much more complex Sites through its built in PHP generator. Together with a Java-scriptable application like Fusion Pro ($599.00) or even InDesign itself, you have a very powerful set of technologies that can accommodate a wide variety of cross-media and advanced variable imaging projects that require little programming smarts, just a willingness to test and learn. There are lots of ways to “skin the cat”, and with a little hard work and creativity, applications can sometimes be built with off the shelf toolsets. I certainly am not discounting any of the suggestions above, as they can provide some very robust applications, I just want to suggest there are lots of tools in the shed, some that you may already have.

  9. Paul O'Brien

    David,

    Your spot on with FileMaker Pro! This was our first VDP solution and still comes in very handy for “quick-and-dirty” solutions. It should be in every VDPer’s bag of tricks.

Comments are closed.