Category Archives: XML Prague

On Reflection

Having reread my recent post on HTML5, I suppose I’d better stress the fact that it was never meant to be a commentary on HTML5 itself. It was a rant, something that happened because of the attitudes I think have increased in tandem with HTML5’s growing popularity. I really don’t know enough HTML5 to have an informed opinion beyond what I see suggested and discussed about it that is related to markup in general. My comments should be read in that light.

Take the addition of document semantics in HTML5 as a case in point. For example, the article and section tags are welcomed additions, as are, in my humble opinion, the fairly minor redefinitions of the em and strong semantics. And there’s some important work being done in the area of extensible semantics for the web (see, for example, Robin Berjon’s XML Prague paper, Distributed Extensibility: Finally Done Right? and the Web Components web page on best practices), which turned out to be a heated topic at Balisage this year because quite a few of its participants are, like me, grumpy old men defending their own turf.

These are steps in the right direction, because they move away from the presentational horror that is the “old” HTML and to a more semantic web. Semantics is about meaning, and meaning is now being added to the web rather than simply empty but oh-so-cool visuals. I should add that some very cool visuals are being added, too, but in, and please pardon the joke, a meaningful way.

But, and maybe this is just me, it’s when those steps are being heralded as original and unique, never thought of before or at least never done right, when history and past (and working) solutions are ignored or misinterpreted because they are part of a standard (XML or even SGML) that is regarded as failed, when I react. Google’s Dominic Denicola provided a case in point when he held a controversial presentation on the subject called Non-Extensible Markup Language at Balisage; unfortunately, only the abstract seems to be available at their website.

That grumpy old men thing, above, is meant as a joke, of course, but I imagine there to be some truth in it. Part of the HTML5 crowd will certainly see it that way because they are trying to solve a very practical problem using a very pragmatic standard. HTML5 is, in part, about keeping old things working while adding new features, and it seems to do the job well. Having to listen to some older markup geeks argue about what XML was actually designed to do must seem to be as being utterly beside the point.

So, what to do? Well, I think it’s largely about education, both for the newer guys to read up on the historical stuff, and the older guys to try to understand why HTML5 is happening the way it is, and then attempting to meet halfway because I’m pretty sure it will benefit both.

Me, I’m in the midst of the reading up phase, and HTML5 – The Missing Manual is an important part of that.

I Should Probably…

Following this year’s Balisage conference, I should probably do a rewrite and update of the whitepaper I presented. It’s on my list of things to do.

On the other hand, I should do an eXist implementation of the version handling system I suggested in that paper. It’s also on my list of things to do.

But then again, I still have to finish my ProXist implementation, the one I presented at XML Prague. It is (you guessed it) on my list.

I have lots of good (well, I think so) ideas first presented at XML conferences, many of which deserve (well, I think so) to live beyond them. After all, a lot of work goes into the papers and the presentations, so shouldn’t I follow up on them more?

My version handling system, for example, should be easy enough to do in eXist. It doesn’t require a revolution, it doesn’t require me to learn how to code in Java, it should be enough to spend a couple of nights writing XQueries and XSLT to produce a usable first version.

ProXist is both simpler and more difficult to finish, but on the other hand, the basic application is already out there and works. It’s a question of rewriting it to be easier for others to test, which basically means redoing it as a standard eXist app.

Yet, instead of doing something about either of them, here I am, writing this blog post. It’s conference procrastination taken to yet another level.

And the next XML Prague deadline is coming up fast.

Linux Ready for the Desktop and All That

My recent XML Prague presentation ran from a Linux partition, the first time in a while I’ve used Linux for presenting anything. The reasoning was simple; I’d developed the accompanying demo on Linux, on a server on localhost, so it would be much easier to just write a presentation in Open Office than to move the demo to something else.

It wasn’t.

I’d fixed every bug in the demo, styled my web pages in an aesthetically pleasing manner (well, for me), and carefully prepared an XML Prague presentation project in oXygen with only the files I would need to show, making sure that they’d fit without scrolling when projected in a lower resolution. I’d bookmarked the important code, and I’d folded everything else. My demo was in great shape.

What I didn’t do beforehand (even though I actually meant to) was to test my Linux laptop in dual screen mode, mirroring the laptop screen to an external monitor using that lower projector resolution. That, of course, was what failed.

My talk was immediately after a coffee break so I figured I’d hook up my laptop immediately after the last talk before the break and test all this. How hard could it be?

Well, no mirroring in that lower resolution. Mirroring in a higher one (the laptop’s native resolution) was possible but of course, the projector wouldn’t work in that resolution. They usually don’t. Dual screen mode, outputting two different screens, didn’t work because I wouldn’t be able to see on my laptop’s screen what was being projected for the audience. I tested pretty much every setting there was but to no avail.

And then the (Gnome) window manager decided it couldn’t take the abuse any longer and crashed.

I rebooted into KDE, hoping it would fare better, but all I got for my troubles was another crash. Not the same software, mind, but something or the other in KDE. I hadn’t really tried anything very dramatic, I’d simply changed the display modes a few times.

So I rebooted again and accepted my faith, booting into Gnome and using the dual screen mode where I’d be flying blind unless twisting my head all the way back like that poor girl in The Exorcist, trying to run the demo from the laptop’s touchpad in front of me while hurting my neck to see the results on the large screen behind and above me.

If you’ve watched the conference video (second day, about 7 or 8 hours into the file), you now know why.

My laptop is not particularly fancy or modern. It’s a 3-yo Thinkpad with an Nvidia Optimus graphics card, the kind that includes what was then a high-end Nvidia card and a low-end Intel card, the idea being that you use the former for the graphics-intensive stuff while reserving the latter for the 2D desktop stuff. It still doesn’t work properly in Linux so I only use an Nvidia only mode. It’s not something I blame the Linux developers for–the Optimus is proprietary and thus not something easily handled in open source–but it is what it is and quite common.

But other than that, there is nothing very special about my laptop. It just works, mostly. Well, it should.

So is Linux ready for the desktop yet?

This Year’s XML Prague…

…was fabulous. It always is, don’t get me wrong, but this one was the best yet. It’s all on video at the conference website, which, all things considered, is a pretty decent substitute for being otherwise engaged, but Prague this time of year is the XML capital of Europe and the place to be.

For one thing, I think I finally actually understand some of the streaming part of the up-and-coming XSLT 3.0 spec, thanks to Abel Braaksma and Michael Kay, who both presented papers on the subject.

John Lumley presented a paper on lessons learned when finalising a standard library for XSLT/XPath extensions to manipulate binary data, a brilliant talk.

George Bina showed oXygen on mobile devices with the crowd cheering his every swipe of the iPad screen, in what was probably one of the most memorable demos ever at XML Prague.

And there was me, lastly (literally; I was the last scheduled speaker, right before a concluding interactive talk led by Robin Berjon), showing my ProXist demo. It all went surprisingly well, except for a slight problem with Linux and Gnome.

You should have been there.

Oh, and…

…most of the ProX stuff is available at Github. Not the eXist web pages, yet, but that’s because I’m still experimenting with them and there’s some work left. There’s the Balisage demo, and there’s the basic ProXist stuff, with pipelines and XQueries and such, and there’s the authoring environment (with Relax NG schema, FO, etc), but no instructions on how to get any of it to run, yet.

I have a test app running locally, a little something that is about as simple as I can make it, but since I am not a web developer (I’m a markup geek), the HTML is awkward, the CSS nonexistent apart from the default eXist stuff, and the XQueries somewhat painful. I do think it’s going to be pretty cool, though, and look forward to presenting it at XML Prague.

ProXist and My XML Prague Paper

I recently submitted the final version of my XML Prague whitepaper about my eXist implementation of ProX, called ProXist (with apologies for the tacky name). While I’m generally pleased with the paper, the actual demo implementation I am going to present at the conference is not quite finished yet and I wish I had another week to fill in the missing parts.

Most of the ProXist stuff works but there are still some dots to connect. For example, something that currently occupies the philosophical part of my brain has to do with how to run the ProX wrapper process, the one that configures the child process that actually does stuff to the input. ProX, so far, has been very much about automation and about things happening behind the scenes, and so I have aimed for as few end user steps as possible.

My Balisage ProX demo was a simple wrapper pipeline that did what it did in one go. Everything was fitted inside that pipeline: selecting the input, configuring the process that is to be applied to the input in an XForm, postprocessing the configured process and converting it to a script that will run the child process, running the child process, saving the results. Everything.

But the other day, while working on the eXist version and toying with its web application development IDE, it dawned on me that there doesn’t have to be a single unified wrapper process. If its components are presented on a web page and every one of them includes logic to check if the information from a required step is available or not (for example, a simple check to confirm that an input has been chosen before the process can be configured), they don’t have to be explicitly connected.

The web page that presents the components (mainly, selecting input and configuring the process to be applied on the input) then becomes an implicit wrapper. The user reads the page and the presentation order and the input checks are enough. There is no longer a need a unified wrapper process.

Now, you may think this is obvious, and I have to admit that it now seems obvious to me, too. But I sometimes find it to move from one mindset (for example, that automation bit I mentioned, above) to another (such as the situation at hand, the actual environment I implement things in) as easily as I would like. If this is because I’m getting older or if it’s who I am, I don’t know. In this particular case, I was so convinced that the unified wrapper was the way to go that it got in the way of a better solution.

At least I think it’s a better solution. If it isn’t, hopefully I can change my mind again and in time.

See you at XML Prague.


I’ve been working on an eXist-based implementation of my XProc abstraction layer, ProX, hoping to have something that runs before XML Prague, next month. It turns out that the paper I submitted on the subject has been accepted, so now I guess I just have to.

The ProX implementation should not be terribly complicated to finish, but until recently it risked to be rather hackish (is that a word?) because the XMLCalabash eXist module written by Jim Fuller was rather primitive: it would only support pointing out the pipeline to run and one, hard-coded output port. I foresaw a more or less complete rewrite of the ProX wrapper in XQuery.

Luckily, Jim very graciously agreed to rewrite his module into something more immediately usable. I received the first updated module to test in December and the most recent update just a few days ago. He also found a bug in Calabash’s URI handling and sent a first fix to me along with the updated module. There are still things to do but for me, Christmas came really early this year.

Oh, and I’m calling the implementation, and the paper, ProXist. Sorry about that.

Micro XML and Namespaces

Micro XML is an attempt by James Clark, John Cowan and Uche Ogbuji to simplify XML and get rid of all that extra baggage that currently surrounds it. DOCTYPE and PIs are both removed, UTF-8 is mandatory, draconian error handling is no longer a must, and–perhaps most controversially–namespaces are gone, too.

Uche Ogbuji held a brilliant talk about Micro XML at the recent XML Prague 2013 conference, so rather than reiterating his arguments, I suggest you watch the presentation once it’s made available at the XML Prague website.

What I did want to comment about is this namespaces business. Of everything proposed in the Micro XML spec, the removal of namespaces is clearly the most controversial, as indicated by the many tweets following Uche’s talk. But should you be upset? I mean, really?

I’ve done some fair bit of XML stuff involving namespaces lately (yes, I know, there’s no way to avoid it, really). There’s a Relax NG compact schema that I wrote that uses several, including a default “”. There are conversions from external XSD-based XML to that Relax NG-based XML using XSLT 2.0, and there are conversions from the Relax NG schema to (an obviously not namespace-aware) DTD to satisfy the needs of an editor that does not know what Relax NG is. (And I can’t bring myself to write XSDs; they are the spawn of Satan.) And there are XProc-based pipelines that glue these things together, and they obviously need to be aware of the namespaces in addition to the ones they use themselves.

Lots of namespaces, in other words. And I’m not exaggerating when I tell you that a vast majority of the problems I had and the weirdness I encountered had to do with namespaces.

Nothing coming out from the transformation? A forgotten implied default namespace in the source XML. Namespace declarations in the target XML messing up validation? That same default namespace. The wrong prefix for the XLink namespace in the target XML? No explicit namespace declaration in the source. An unwanted and disallowed XLink namespace declaration being complained about in the root element of an XML document in the process of being checked out from a repository? A web service helpfully adding a seemingly missing namespace declaration to a root element into content in a SOAP envelope, resulting in a document that could not be opened but that did not show any problems in the repository itself, only on its way out…

These are just a few select examples from my plight, and while I may have some of the details slightly wrong here, you probably get the idea. The list goes on.

And why is this all happening? Because someone at some point thought that wouldn’t it be nice if you could share your XML with everyone on the globe with no risk of name collisions and clashing semantics? Wouldn’t it be cool if the conflicting schemas could all be identified using a URI? We could have a throwaway name prefix attached to that URI and implement processing that could hide the prefix for the end user, simplifying things further…

Of course, that someone’s idea of backwards compatibility was simply that to a DTD, the revolution would be hidden in an extra attribute and an element type name containing a colon.

The fact is that I have yet to be helped by namespaces when using XML from the other side of the globe. In fact, I have yet to encounter a situation where I need to process unknown XML where potential clashes in semantics can do harm without me spotting the problem well in advance and taking care of it. The fact is that I don’t often need to use XML from the other side of the globe, out of the blue. It tends to happen in a context, in a controlled manner.

But when I do process that XML, knowing full well the source semantics and how they can map to my needs, it is always the namespaces that cause me grief.

Namespaces are among the least understood features of modern-day XML and among the most abused. The tools range from helpful to disastrous to completely ignorant or just plain wrong, and there are as many reasons for this as there are XML parser implementations out there. You know right from the start that you will have problems, so you’d better resupply the medicine cabinet well in advance or get ready for that headache.

So, Micro XML? Yes, please. Now?