Category Archives: XML databases

Query vs Change

My friend and fellow XML geek Erik Siegel writes about marketing XML databases in his latest blog post. Basically, Erik says that XML database vendors aren’t doing themselves any favours by marketing their products simply as databases in the strictest definition of the word, that is, places for storing, indexing and querying data that happens to be XML, instead of bringing forward other relevant points having to do with processing the XML with other cool standards that all begin with the letter X.

I’m not going to argue the points he makes – they are perfectly valid and I agree with them – but one phrase in his list of XML database features struck a chord with me:

Processing Engine: On top of this data storage is always a processing engine. This engine can run XQuery programs for querying and manipulating the database. Besides XQuery it usually implements other X languages like XSLT, XProc, XInclude, Schema validations and the likes.

The emphasis is mine.

Manipulating, to me, means changing in some way. In other words, manipulating as opposed to querying. You may think I’m arguing semantics, especially considering that it’s what you do with XQuery in an XML database. You query and you manipulate.

Problem is, for me, a database is all about storage, it’s all about storing my data reliably. Yes, they all add functionality for all kinds of stuff, from queries to, well, manipulation, but to me, the focus is on reliable storage. If I store my data there, I want to know for certain that I can retrieve that same data three years later. I don’t want to query and manipulate my data; I want to query my data and then do stuff with the data outside the storage area, if that makes sense.

That, of course, is where version handling comes in. If you manipulate data, you change it. But if you want to (reliably) store your old data, you first make a copy, then change the copy and store that, preferably linking the two versions with each other in some nifty manner, so that you’ll know that they are related to each other, three years later.

Of course, that’s quite a bit more functionality than that simple database for, erm, queries and manipulation, but to me, reliable versioning is what really makes them useful. Without it, I’d be constantly worried about my XQuery skills, which, I have to admit, could be better.