Category Archives: XProc

XML Prague Week

As it turns out, XML Prague was rather eventful.

For me, the week began with a very productive two-day XProc workshop. I’m part of the W3C Community Group that is producing a 3.0 version of the XProc specification. I’m pleased to report that we made a lot of progress. There is going to be a candidate release of the spec (multiple specs, actually) in the spring, and alpha releases of two XProc implementations, XML Calabash and Morgana XProc in June, coinciding with an XML conference in London in June.

Which brings me to the next item: There won’t be an XML London this year (Charles Foster decided not to organise one), but instead, we announced Markup UK during XML Prague, to be held on June 9-10. I am organising the conference together with Geert Bormans, Tom Hillman, and Andrew Sales. Details will follow ASAP. Watch this space (and the conference website, obviously).

As for XML Prague itself, it was as great as always. Great talks, great people, great food, great beer.

Mr Smith Goes to Washington

My paper submission to this year’s Balisage conference was accepted. It’s about an eXist implementation I did for the Swedish Federation of Farmers (LRF), and while I may not be completely objective, I think the system is very cool. From the conference blurb:

The Federation of Swedish Farmers (LRF) provides its 170,000 members with a web-based service to check compliance with state and EU farming regulations. These checklists are also produced nightly both as generic checklists with more than 130 pages and as individualised checklists for registered members. The system consists of an eXist database coupled with oXygen Author. The checklists and their related contents are edited, stored, and processed, published as PDFs, and exported to the SQL database which stores member registration, feeds the website, and does various other tasks. The system uses XQuery, XSLT, XInclude modularization, an extended XLink linkbase, and other markup technologies. It currently handles more than 40,000 PDF documents a year and many more than that in the web-based forms.

This is the second version of the LRF system. The first, presented at XML Prague in 2013, was XProc-based and represented my somewhat naive trust in the state of XProc in eXist, The new one I rewrote in XQuery, having tested (and failed miserably at using) the XProc module that is now available. XProc in eXist, sadly, is not yet ready for prime time.

Be as it may, I’m really pleased about both the system and my paper. and hope to see you there.

ProXist v2

For the last few days, I’ve been busy updating ProXist, my XProc abstraction layer and app for eXist. There is a new(-ish) XProc package for eXist that promises to support a lot of (Norm Walsh’s XProc engine) Calabash’s capabilities, so I decided it was time to test it out and do a new ProXist at the same time.

My XML Prague ProXist version supported only my custom document type and stylesheets, and barely those at that. It was meant to be a demo. For the new version, though, I’m thinking of doing a default implementation for DocBook, including some of the more commonly used stylesheets and a couple of standard pipelines, packaged so they can be used with ProXist–it should be a question of writing a ProX blueprint XML file, theoretically, plus something that helps me list DocBook resources included using XInclude.

At the same time, I’m finally updating the ProXist documentation. It’s written using DocBook, incidentally, and now part of the git repository.

ProXist is not even close to being finished, but at least I seem to have moved on from procrastinating to actually doing something.

ProXist and My XML Prague Paper

I recently submitted the final version of my XML Prague whitepaper about my eXist implementation of ProX, called ProXist (with apologies for the tacky name). While I’m generally pleased with the paper, the actual demo implementation I am going to present at the conference is not quite finished yet and I wish I had another week to fill in the missing parts.

Most of the ProXist stuff works but there are still some dots to connect. For example, something that currently occupies the philosophical part of my brain has to do with how to run the ProX wrapper process, the one that configures the child process that actually does stuff to the input. ProX, so far, has been very much about automation and about things happening behind the scenes, and so I have aimed for as few end user steps as possible.

My Balisage ProX demo was a simple wrapper pipeline that did what it did in one go. Everything was fitted inside that pipeline: selecting the input, configuring the process that is to be applied to the input in an XForm, postprocessing the configured process and converting it to a script that will run the child process, running the child process, saving the results. Everything.

But the other day, while working on the eXist version and toying with its web application development IDE, it dawned on me that there doesn’t have to be a single unified wrapper process. If its components are presented on a web page and every one of them includes logic to check if the information from a required step is available or not (for example, a simple check to confirm that an input has been chosen before the process can be configured), they don’t have to be explicitly connected.

The web page that presents the components (mainly, selecting input and configuring the process to be applied on the input) then becomes an implicit wrapper. The user reads the page and the presentation order and the input checks are enough. There is no longer a need a unified wrapper process.

Now, you may think this is obvious, and I have to admit that it now seems obvious to me, too. But I sometimes find it to move from one mindset (for example, that automation bit I mentioned, above) to another (such as the situation at hand, the actual environment I implement things in) as easily as I would like. If this is because I’m getting older or if it’s who I am, I don’t know. In this particular case, I was so convinced that the unified wrapper was the way to go that it got in the way of a better solution.

At least I think it’s a better solution. If it isn’t, hopefully I can change my mind again and in time.

See you at XML Prague.


I’ve been working on an eXist-based implementation of my XProc abstraction layer, ProX, hoping to have something that runs before XML Prague, next month. It turns out that the paper I submitted on the subject has been accepted, so now I guess I just have to.

The ProX implementation should not be terribly complicated to finish, but until recently it risked to be rather hackish (is that a word?) because the XMLCalabash eXist module written by Jim Fuller was rather primitive: it would only support pointing out the pipeline to run and one, hard-coded output port. I foresaw a more or less complete rewrite of the ProX wrapper in XQuery.

Luckily, Jim very graciously agreed to rewrite his module into something more immediately usable. I received the first updated module to test in December and the most recent update just a few days ago. He also found a bug in Calabash’s URI handling and sent a first fix to me along with the updated module. There are still things to do but for me, Christmas came really early this year.

Oh, and I’m calling the implementation, and the paper, ProXist. Sorry about that.

Open-source ProX

I recently got the go-ahead from my boss at Condesign to open-source ProX, my XML processing XML and its first implementation. It sounds rather more than what it actually is – right now there’s a wrapper pipeline, an XForm, some XSLT and an example DTD – but I happen to think ProX is pretty cool and potentially useful.

I’ll make the stuff available at Github as soon as I have the time, of course with a proud announcement here. In the mean time, you can get an idea about what ProX is by reading my Balisage papers ProX: XML for interfacing with XML for processing XML (and an XForm to go with it) and Using XML to Implement XML.

Not One But Two Papers Accepted

Both of my papers submitted to Balisage were accepted. I feel honoured and somewhat nervous.

My second paper is a progress report of sorts and about ProX, my XML processing XML. I think it’s going to be very cool, especially because I will have an implementation to show. I finished the wrapper pipeline to run everything with just the other day, and one day very soon that wrapper will do things with a live ProX (my processing XML format) document, including some actual publishing.

As the Balisage blurb says, life is good.

Processing XML with Process XML

I presented my ideas on processing XML using XML at Balisage, earlier this year. While there I actually demo’d converting my Process XML draft to a FreeMind-based user interface at the MarkLogic-sponsored demo jam. Well, it wasn’t as much a user interface as it was a representation of the XML that might be used to create a user interface with, but it was a start and today I’ve finally taken it a few steps further.

Um, that’s not exactly true either. I’ve worked on my Process XML some more during the last few weeks, because I’m using it for a customer project. What started out as a DTD is now a RelaxNG compact schema that uses xml:base to ease processing, covers most of the current Calabash version (1.0.3-94, as I write this), and is actually useful.

But today I wrote “live” Process XML, XSLT and pipelines that will make it a reality. The GUI will not happen for some time yet, because there is no need for one in the current implementation, but it’s going to be used for describing various XML-related processes that include XProc pipelines on an ​eXist​ server handling on-demand publishing.

And it’s very cool.

Balisage Impressions, At Long Last

I tend to write these “long time no post” posts from time to time. It’s a guilt thing, I suppose, and it’s how this post began life.

This time, though, I did have things to write about. There is the Balisage 2012 markup conference I attended two weeks ago, and it would be such a waste not to post something on it. I gave a paper there, my little something on how to implement XProc with more XML, and I even participated in MarkLogic’s demo jam with even more of the same. Great fun, that.

The most fun I had at Balisage had to do with listening to others give papers, however, with special mention having to go to Wendell Piez‘s talk about how to process LMNL (non-XML) markup. LMNL is all about overlapping structures, the kind of thing that XML just won’t do, and it’s absolutely awesome. For some reason I’ve not given the overlap problem (or, for that matter, the related problem with discontinuous structures) much thought lately. I should have. LMNL, it seems to me, should be very useful for analysing dead languages such as Middle Egyptian where overlapping markup could be used to present alternative interpretations for grammar, pronunciation, and so on. There’s a paper begging to be written, right there. Next year, maybe.

It is good sometimes to remember that XML is not the answer to everything.

But there was more, a lot more. There were some excellent presenters, such as Steven Pemberton discussing abstraction errors (among others, in the C language), Norm Walsh with his compact XProc syntax proposal, and, of course, the undisputed king of keynotes, Michael Sperberg-McQueen, who, as Eric van der Vlist tweeted, “has a special gift to make each presenter feel clever in his closing keynotes.” And so many others.

And I really should mention Betty Harvey’s talk about implementing low-cost electronic documentation for a DoD contractor. In glorious SGML. I love history lessons, especially in my chosen field, and Betty’s was a stroll down memory lane.

Anyway, Balisage was fun and you really should have been there. Or maybe not if you aren’t into markup, but if so, why are you still reading this?


I’m going to spend the next week or two doing a test implementation of XProc for our document management system, Cassis TI. XProc, as some of you will know, is a pipeline processing language for XML processing, in the same vein as pipe processing in the *nix world. It’s intended to standardise and ease XML processing by treating the processing as a black box consisting of smaller black boxes; in other words, what is inside is less interesting than how the in- and outputs are defined and used.

The test is about producing PDF output so it’s nothing fancy or new, but it’s important because I believe we can replace our current backend with an XProc-based processor, making things easier, faster and better for programmers and users alike.