Category Archives: Uncategorized

The Uniqueness of Things

Found the below in my Drafts folder, unearthed after I imported my old blog to the WordPress instance on my own server. While it was written six years ago, I thought it was still worth publishing after I read it. I hope you think so too.

Two years after writing this (and having long since forgotten that I did), I presented the concepts behind URNs and the need for uniqueness in document management at XML Finland. The system was finished and done, and I was proud of it. It wasn’t perfect but it was battle-tested and we knew about its weaknesses. I really wanted to talk about it with other markup people, colleagues who knew about angled brackets, and I was sure they’d understand. In fact, I feared some might say they implemented it all years ago, only better. Yet, what is described here also happened at XML Finland; the importance of uniqueness and the advantages of semantic naming using URNs went right past them, judging by the Q&A afterwards.

Or maybe it’s just that I’m wrong.

Anyway, here goes…

===

I’ve been busy finalising an authoring system that is supposed to identify every resource ever stored in it with URNs. What follows is just a rant, but I do think about it and would like to know the why’s and the how’s. I would like to know why the concept of uniqueness is so difficult to understand.

A URN, of course, is the unique name of a document, as opposed to its location, the URL. Compare with a book in a library. Sometimes books get reorganised in a library, meaning that they will be put on another shelf (another address), but the name will remain the same. The name is unique while the address is not. When identifying content to be reused, this is the principle you need to honour.

Anyway…

It’s been my primary concern all along to ensure that everything is identified with a URN. Everything. If you create a document and link to another, meaning to insert that other document in the one you’re editing, the link should take the form URN#id, where the hash separates the name of the document from a node pointed out within the document when checked into the database. When checked out, in the XML editor, however, the form should be URL#id, since URLs are what most authoring systems can handle; we need the URL for styling the document in the editor, to publish it, and to process it in various ways.

A URN is possible, of course, but it needs to be replaced with a URL when processing, one way or another, so the decision was to use a URL when a resource has been checked out and replace it with a URN when checked in.

Early on, we did make a demo application that opened a document containing URNs pointing to other documents, replaced them with the corresponding URLs, normalised the resulting document, and published it using XSL and FOP. It worked like a charm.

Today, I found that the check-in does not replace the URLs with URNs. The file name is a pseudo-URN (with colons replaced by underscores) so I know my URN scheme is being used, but that’s as far as it goes. The URN-like file names remain.

Talking to a developer, I realised that he hadn’t even thought about it. He was using URNs to identify the resources in the database (the URN being an attribute on the object) but in spite of all our planning, all of our tests, the URLs were left in the links when the document containing them had been checked in. The object IDs in the database are unique, he said, but yes (he admitted), the file names are being used in the database so we can’t store two identically named files in the same folder in the database.

This is not a major problem since we already have the code to do all the work, but what surprises me is that nobody made the connection. Me, I assumed everyone had understood but did not check. I simply assumed that following the test, following the discussions, following the months of development, no-one could fail to understand their true meaning.

Wrong.

What is it that makes the concept of URNs so difficult?

Tommy Emmanuel in Concert

I went to see Tommy Emmanuel do all kinds of things to and with a guitar at the Göteborg Concert Hall last Sunday. It was my second Tommy Emmanuel concert, and I have to say it was even better than the first, in December 2012.

I think everybody should attend at least one Tommy Emmanuel concert in their lifetimes.

Clarkson Sacked, Etc

I’m sure most of you know that Top Gear presenter Jeremy Clarkson was sacked after a “fracas” with a producer.

I think it’s sad that the BBC chose a resolution that punishes pretty much anyone who likes the show. Clarkson, I suspect, will simply move on to somewhere else while enjoying a larger paycheck, and it wouldn’t surprise me if Andy Wilman (Top Gear producer), James May and Richard Hammond joined him to create a new show.

The old Top Gear, however, will have to reinvent itself, with new presenters and angry fans less likely to keep on watching. I’m pretty sure that the fantastic 350 million viewers worldwide every week figure will soon be a thing of the past and the BBC will have to get that particular piece of their weekly budget from somewhere else. Mainly, I suspect, from the viewers.

Hello world!

I used to host sgmlguru.org in my basement, using an old Debian box and a dynamic DNS feature in my VDSL router. The site would go down at regular intervals, sometimes because I got a new IP address and the DynDNS service didn’t follow, and sometimes because that box runs Debian Unstable and I’m an apt-get junkie, updating the system at least a couple of times a week.

This was rather unreliable and didn’t reflect on my internet presence very favourably, so yesterday I finally had enough and bought myself a $7/month VPS at VPSDime. Nothing fancy, just Debian 7 with 6GB RAM running on OpenVZ. While I’m not a expert by any means, I do have some command line experience on Debian, so setting up a basic server with WordPress and some other stuff via SSH was extremely easy.

I have to say I’m really, really pleased.

A Note

Noting it’s been two months since I last wrote anything here, I feel it is time to add the following:

If you hoped for a new version of ProXist (as hinted by a previous blog entry), sorry. It has not happened yet. It will, eventually.

If you expected something else from me, sorry again. It has not happened yet. It might, if I find out what you’re on about.

Contact me if you want blame assigned.