Semantic Documents

I’m back from XML Finland, where I held a presentation on how to use the concept of semantic documents in content management systems. Not everyone was convinced, but I wasn’t thrown out, either.

A semantic document is the core information carrier, before a language or other means of presentation to an audience, is added. It’s an abstraction; obviously, there can be no such thing in the real world but as a concept, the semantic document is useful.

For example, a translation of a document can using the concept be defined as a rendition of the original, just as a JPG image can be rendered in, say, PNG without the contents of the image changing. It is very strictly a matter of definition–the rendition is not necessarily identical in all details of content to the original, it’s simply defined to be a matching rendition for a target audience.

Of course, for a semantic document and its rendition in a given language to be meaningful in a CMS, none of those varying details can be significant to the semantics of the basic information carrier, only to make a necessary clarification of the core information to the target audience. In other words, a translation may differ from the original for, say, cultural reasons (if the original language’s details in question are bound to the original language and readership), but the basic meaning cannot be allowed to change.

To the concept I also added version handling, that is, a formal description of the evolution of the contents of the basic information over time. When a new version is required is, of course, also a matter of definition; I’d go with “a significant and (in some way) completed change”. What’s important is that a two matching or equivalent renditions of the semantic document must always use matching versions.

Expressed using a pseudo-URN schema, if the core semantic document in some well-defined version (say “1”) is defined as URN:1, the Swedish and Finnish versions would be defined as URN:1:sv and URN:1:fi, respectively. They would be defined to be different renditions of each other but identical in basic information. It follows that if a URN:2:sv was made, a new Finnish translation would have to be created, because the old translation would differ in some way, according to the definition

This, of course, is largely a philosophical question. In practice, all kinds of questions arise. I had several objections from the floor, of which most seemed to have to do with the evolution of the translation independently from the original. In my basic definition, of course, this is not a problem since the whole schema is a matter of definition, but in the real world, an independent evolution of a translation is often a very real problem.

It could well be that a translation is worked on rather than the original, for example, in a multi-national environment where different teams manage different parts of the content. While theoretically perfectly manageable simply by bumping the versions of that particular translation, a system keeping track of, say, 40+ active target languages becomes a practical problem.

I don’t think the problem is unsolvable if there is a system in place to keep track of all those different URNs, but only if the basic principles are strictly adhered to. For example, you can never be allowed to develop the content in different languages independently from each other at the same time, because the situation that would arise would have to deal with what in the software development world is known as “forking”, that is, developing differing content from the same basic version. While also solvable, the benefits of such an approach in documentation are doubtful.

Far easier and probably better is to define a “master language” as the only language allowed to drive content change. In the above pseudo-URNs, Swedish could be defined as a master language, meaning that any new content would have to be added to it first and then translated to the other languages.

This is the basic principle behind the CMS, Cassis, that we develop at Condesign. It works, in that the information remains consistent and traceable, regardless of language, and allows for freely modularising documents for maximum reuse.

I would be interested in hearing opposing views. Some I addressed during my talk in Finland, but I’m sure there is more. Is there a reason you can think of that would break the principle of the semantic document?

XML Finland, Not On A Boat After All

XML Finland will not be held on board a boat, after all. The Radisson Blue Seaside hotel in Helsinki is the new venue and the seminar limited to one day, November 10.The organisers say logistics are to blame. I have to say I’m disappointed. An XML boat would have been fun. Also, I’ll miss the evening snacks and sauna, as I’ll have to catch a plane in the evening.

I wonder if XML Prague can be persuaded to relocate to a boat instead.

Google Plus

Yesterday, I started my browser and found that Google had added You+ to the far left on www.google.com. Being the geek I am, naturally I joined this initiative. Google Plus immediately reorganised my Google settings and all of a sudden, my Blogger page is nowhere to be found. I had to go back to the previous Settings version to find it.

I’m all for change, but I don’t like this type of change.

XML Prague 2012

Speaking of XML conferences, XML Prague 2012 has been announced and will take place a month earlier than the last few times, on February 10-12. The venue is also new, a good thing since the last two events were sold out.

Looking forward to this one already.

Flight Sims

There is a terrific open source flight simulator called FlighGear. It’s freely available for my platform of choice, Debian Linux (and a number of others, including Windows and Mac OS X) and it’s quite mature these days, so naturally it’s what I run when I want to fly a plane these days. When I still had a Windows partition that worked, I have to admit I quite liked Microsoft’s classic Flight Simulator, but my Vista partition doesn’t work all that well and anyway, Microsoft killed off the sim a year or two ago. FlightGear is a more than adequate replacement.

Today I learned that somebody is marketing an older FlightGear version under a different name (Pro Flight Simulator), charging around $50 for a DVD or download and promising free lifetime updates. Of course, there is no (easily found) mention of FlightGear anywhere on their site, and I doubt the source code is easily available, either.

It has to be somewhere, though. See, FlightGear is GPL software, which basically means that you can do whatever you want with the software (including selling copies of it) for as long as you also make available the source code. I think GPL lists a few other conditions as well, but the idea is that software should be free (as in speech).

So what these people do when ripping off free software is most likely not illegal, merely unethical. To further firmly establish themselves in the gutter, they have produced a number of blogs and fake reviews to market the product, seemingly without any shame; do a Google search if you are interested, but I won’t help their cause by giving you a direct link.

Read all about the scam at http://www.flightgear.org/flightprosim.html, and download a FREE copy of the latest version if you are interested in flight sims. Or just spread the word.

Me and XML in Stockholm

I’ll be talking about XML in Stockholm on June 16th. The event is a one-day tutorial for technical writers, managers and other interested parties, organised by Dokumentinfo. They organise tutorials on various subjects related to document management and archiving, and a yearly conference where I was invited to speak last year.

So far I have few details but I’m pretty sure I’ll manage to include XLink, somehow.