Category Archives: eXist

XML Prague Is Over

This year’s XML Prague is over and I’m writing this at the Munich airport on my way back home. My brain is still hurting.

The conference was fabulous, as always. Among the highlights were Gerrit Imsieke’s awesome XSLT trickery for splitting XML, Steven Pemberton’s walk-through of his Invisible XML spec, and Michael Piotrowski’s nostalgic look back at SGML. But my personal favourite has to be Adam Retter’s introduction to his new Fusion DB XML (and NoSQL) database that I think just might prove to be a game-changer. He’s launching it in June at Markup UK in London – another great reason for everyone to join us there!

I also gave a paper at XML Prague, about merging two XML sources of the Swedish Code of Statutes, also known as SFS, a project I’ve been busy with for the last eight months or so. It’s been quite a ride, and if you’re interested, have a look at the XML Prague proceedings. There are lot of other good papers there, too.

Mr Smith Goes to Washington

My paper submission to this year’s Balisage conference was accepted. It’s about an eXist implementation I did for the Swedish Federation of Farmers (LRF), and while I may not be completely objective, I think the system is very cool. From the conference blurb:

The Federation of Swedish Farmers (LRF) provides its 170,000 members with a web-based service to check compliance with state and EU farming regulations. These checklists are also produced nightly both as generic checklists with more than 130 pages and as individualised checklists for registered members. The system consists of an eXist database coupled with oXygen Author. The checklists and their related contents are edited, stored, and processed, published as PDFs, and exported to the SQL database which stores member registration, feeds the website, and does various other tasks. The system uses XQuery, XSLT, XInclude modularization, an extended XLink linkbase, and other markup technologies. It currently handles more than 40,000 PDF documents a year and many more than that in the web-based forms.

This is the second version of the LRF system. The first, presented at XML Prague in 2013, was XProc-based and represented my somewhat naive trust in the state of XProc in eXist, The new one I rewrote in XQuery, having tested (and failed miserably at using) the XProc module that is now available. XProc in eXist, sadly, is not yet ready for prime time.

Be as it may, I’m really pleased about both the system and my paper. and hope to see you there.

Feeling Like A (Real) Programmer

I’ve spent most of tonight writing an XQuery script that reads stuff from a linkbase I’m using to keep track of resources in eXist. It’s not much yet, just a couple of queries to get resource URIs based on various conditions, but it strikes me that doing an extended XLink implementation in eXist really shouldn’t be that hard. Even by a non-programmer such as yours truly.

ProXist v2

For the last few days, I’ve been busy updating ProXist, my XProc abstraction layer and app for eXist. There is a new(-ish) XProc package for eXist that promises to support a lot of (Norm Walsh’s XProc engine) Calabash’s capabilities, so I decided it was time to test it out and do a new ProXist at the same time.

My XML Prague ProXist version supported only my custom document type and stylesheets, and barely those at that. It was meant to be a demo. For the new version, though, I’m thinking of doing a default implementation for DocBook, including some of the more commonly used stylesheets and a couple of standard pipelines, packaged so they can be used with ProXist–it should be a question of writing a ProX blueprint XML file, theoretically, plus something that helps me list DocBook resources included using XInclude.

At the same time, I’m finally updating the ProXist documentation. It’s written using DocBook, incidentally, and now part of the git repository.

ProXist is not even close to being finished, but at least I seem to have moved on from procrastinating to actually doing something.

Submitted My Final Balisage Edit

I submitted the final edit of my Balisage paper, Multilevel Versioning for XML Documents, the other day. While I did try to shorten it (I seem to be unable to produce a short paper) and, of course, correct problems and mistakes pointed out by reviewers, there were no radical changes, and so I am forced to draw one of two possible conclusions:

I am deluded and simply don’t know what I’m talking about. This is an awful feeling and happens to me a lot after submitting papers.

The paper suggests something that might actually work.

(There is a third conclusion, obviously, one that is a mix of the two, but let’s not go there.)

My paper is about a simple versioning scheme for the eXist XML database, built on top of the versioning extension that ships with it. Its main purpose is to provide granularity to versioning, to provide an author of XML documents with a method to recognise significant new versions as opposed to the long series of saves, each of which comprises a new eXist version.

On the surface of it, my scheme is a typical multilevel versioning system,with integers, decimals, centecimals, etc (1, 1.1, 1.1.1, 1.1.2, 1.1.3, 1.2, …) identifying a level of granularity. The idea is that the lowest level (centecimal, in this case) denotes actual edits while the levels above identify significant new versions. Nothing new or fancy, in other words. What is new (to me, at least; I have not seen this suggested elsewhere) is how the scheme is handled in eXist.

I’m proposing that each level is handled in its separate collection, each using eXist’s versioning extension to keep track of new versions in the respective collections. When a level change occurs (for example, if a new centecimal version such as 1.3.1 is created from 1.3), the new version is created using a simple copy operation from the decimal collection to the centecimal collection. The operation itself (in this case, a check-out from a decimal version to a centecimal version) is kept track of using an XML file logging each such operation and mapping the eXist “standard” version to the new integer, decimal or centecimal revision.

A related task for the XML file is to map the name of the resource to its address; the XML file’s other big purpose is to provide the resources with a naming abstraction so a named resource in a specific significant version can be mapped to an address on the system. I propose using URNs, but most naming conventions should work just as well.

Implementation-wise, the XML version map abstraction is very attractive to me as a non-programmer (or rather, someone whose toolkit of programming languages is mostly restricted to those commonly associated with XML technologies), as I believe most of the operations can be implemented in XSLT and XQuery.

But I’m not there yet. I’ve submitted the final paper and now, I have to produce a sufficiently convincing presentation on the subject.

The presentation is on Tuesday, August 5th, and I’d love to see you there.

ProXist Documentation, Etc

My XProc abstraction thingy for eXist, ProXist, is not the most well-documented open source project there is, but at least there is now something to read. It’s little something in DocBook, just a first draft and terribly incomplete, but something that I’m hoping to make more complete, given enough time.

I also feel it’s time to ProXist it as an eXist app rather than a set of misplaced collections.

Oh, and…

…most of the ProX stuff is available at Github. Not the eXist web pages, yet, but that’s because I’m still experimenting with them and there’s some work left. There’s the Balisage demo, and there’s the basic ProXist stuff, with pipelines and XQueries and such, and there’s the authoring environment (with Relax NG schema, FO, etc), but no instructions on how to get any of it to run, yet.

I have a test app running locally, a little something that is about as simple as I can make it, but since I am not a web developer (I’m a markup geek), the HTML is awkward, the CSS nonexistent apart from the default eXist stuff, and the XQueries somewhat painful. I do think it’s going to be pretty cool, though, and look forward to presenting it at XML Prague.

ProXist and My XML Prague Paper

I recently submitted the final version of my XML Prague whitepaper about my eXist implementation of ProX, called ProXist (with apologies for the tacky name). While I’m generally pleased with the paper, the actual demo implementation I am going to present at the conference is not quite finished yet and I wish I had another week to fill in the missing parts.

Most of the ProXist stuff works but there are still some dots to connect. For example, something that currently occupies the philosophical part of my brain has to do with how to run the ProX wrapper process, the one that configures the child process that actually does stuff to the input. ProX, so far, has been very much about automation and about things happening behind the scenes, and so I have aimed for as few end user steps as possible.

My Balisage ProX demo was a simple wrapper pipeline that did what it did in one go. Everything was fitted inside that pipeline: selecting the input, configuring the process that is to be applied to the input in an XForm, postprocessing the configured process and converting it to a script that will run the child process, running the child process, saving the results. Everything.

But the other day, while working on the eXist version and toying with its web application development IDE, it dawned on me that there doesn’t have to be a single unified wrapper process. If its components are presented on a web page and every one of them includes logic to check if the information from a required step is available or not (for example, a simple check to confirm that an input has been chosen before the process can be configured), they don’t have to be explicitly connected.

The web page that presents the components (mainly, selecting input and configuring the process to be applied on the input) then becomes an implicit wrapper. The user reads the page and the presentation order and the input checks are enough. There is no longer a need a unified wrapper process.

Now, you may think this is obvious, and I have to admit that it now seems obvious to me, too. But I sometimes find it to move from one mindset (for example, that automation bit I mentioned, above) to another (such as the situation at hand, the actual environment I implement things in) as easily as I would like. If this is because I’m getting older or if it’s who I am, I don’t know. In this particular case, I was so convinced that the unified wrapper was the way to go that it got in the way of a better solution.

At least I think it’s a better solution. If it isn’t, hopefully I can change my mind again and in time.

See you at XML Prague.

TIC 2013

I co-presented a paper about the oXygen/eXist solution I’ve been involved in building for The Federation of Swedish Farmers – LRF – at the TIC 2013 conference in Stockholm, Sweden. My co-presenter was Anders Johannesson from LRF, who is a brilliant, brilliant presenter. He is knowledgeable, funny and supremely engaging, and I had loads of fun.