Category Archives: DITA

Topic-based

I recently held a two-day workshop on topic-based information for a client faced with moving from paper-based documentation to multiple outputs in multiple media, especially in “smartphones”. Now, before drawing conclusions, you should know that this particular client does have a reasonably mature process supported by a reasonably mature system. They already produce in XML, they translate their content to multiple languages, and they already publish automatically.

Their information is very much “book-oriented”, however. It’s sequential and it has interdependencies all over the documentation.

They were suggested “topic-based information” as means to an end, and my task, therefore, became to educate them about what is meant by topic-based information, what the intended advantages and frequent challenges are, what standards there are out there to support the concepts, and how it alls relates to their situation today. And of course, I needed to tell them about DITA because while DITA equals neither topic-based nor multi-channel publishing per se, it has become something of a de facto standard for topic-based information and there is a lot to be learned from it.

I remained largely neutral concerning DITA throughout the workshops, but nevertheless, I was forced to reconsider and, in some cases, re-evaluate some of my opinions. DITA is what it is, it is widespread and it is constantly being developed, and it cannot be ignored if discussing topic-based information solutions.

Take the strict topic orientation as a primitive example. One task, one topic. No dependencies, no context or hierarchy linking the topic to others from within the topic itself, no broken cross-references, et cetera. I have frequently dismissed parts of this as the inevitable consequences of ill-designed systems, but as I was highlighting practical examples from my client’s current information, I did see the value of the concept of a single, isolated task beyond mere system limitations. See, while a system does help if implemented properly, any dependencies in the information will nevertheless make it more difficult to maintain and update if used in several different contexts. I could clearly see this happen with my client’s documentation, and while I’m not at liberty to discuss any specifics, theirs was a very good case for minimalism.

More obvious, perhaps, were the strategies implied by DITA concerning online documentation. If publishing for a smartphone, for example, it is obvious that size does matter. There is no room for large overviews or tables, nor is there a place for long narratives. There is no way to know how the reader arrived at the current topic so there is no way to give that narrative, or a longer list of contents or a list of related topics that aren’t essential but nice to have, etc. There are obvious implications on large content, including eliminating those pesky overviews, but also on how to present single, self-sufficient topics.

You have to make every such topic completely independent from the next or the previous ones, because there is no way to know what the next or previous ones were about. The limited space needs to focus on solving the task at hand so giving references and links is tricky at best.

As the topic is included in a publication later on, in DITA maps, and always in a specific context, the target format is only known when creating the publication, and therefore DITA maps are the logical place to include any such references in. Maps provide a logical place to address anything context-related, including hierarchies, references, etc.

DITA is certainly not the only way to achieve strict topic orientation, but it is relatively unique in offering a comprehensive method for achieving it, including minimalist concepts, online documentation requirements, etc, in one place. One could argue the merits of something like S1000D for purpose-filled topical documentation, but while S1000D is many things, I doubt it will ever be accused of minimalism. And these days, DITA is expanding outside its original box within software documentation and, increasingly, solving problems in new domains.

DITA brings with it a number of challenges (that’s the same as “problems” but in presales-speak), of which many have to do with how to restore some of the inherent readability of sequential content meant for paper-based books, and I remain unconvinced in this regard. Markup-wise, the DTD leaves room for improvement, and I think there are better ways to design linking mechanisms (even though DITA includes some clever ID-related tricks). I think specialisation suffers because the original DTD suffers, and I think DITA struggles when it comes to profiling information.

But just as DITA is not the only XML-related standard to offer topic orientation and reuse, it is not the only one with problems. It is perhaps too easy for a grumpy old XML guy like me to dismiss DITA because I find problems in its execution, because there is a lot of good things in it, too, and this blog entry is my way of saying that I am reconsidering.

Who says you can’t teach old dogs new tricks? Next I’ll be embracing Java.

Going to Do DITA

I have a new client and I’m going to do DITA and topic-based information for them. For some reason, all I can think of is Al Pacino and that memorable scene in Godfather III, “just when I thought I was out, they pull me back in.”

List Modelling

I’ve been reading up on DITA. I’ve looked at the specs and the DTD before, obviously, but more from the perspective of an innocent bystander. The DTDs I implement in authoring systems and elsewhere are usually my own, and whenever I need to deliver content in some other format, I simply convert to it. This time things are a bit different, however, as we are considering doing a “DITA Edition” of the content management system I’m responsible for at work, and I need to know how DITA can fit into our stuff.

DITA’s got lots of things that I like, such as the combining of topic IDs with target IDs in references to avoid ID collisions. The DITA way is a very elegant solution and probably a better one than what I would usually do, which is to (in various ways in the DTD and in the authoring environment) make sure that authors can never end up in situations like it to begin with. There’s other stuff, too, but those are best left to another blog entry at some point.

Here, I want to talk about list modelling and specifically something that not only DITA but so many other DTDs and schemas seem to ignore, and that, in my mind, results in bad markup. Let’s start by discussing list semantics first:

A list is, well, a list of things. There are several types of lists, of which unordered and ordered are the most common, and the semantics are probably clear enough: the former lists stuff without a specific order (say, grocery lists) and the latter items whose order is significant (for example, David Letterman’s top ten lists). There’s also the definition list (which, in my mind, is not a list at all but a special case of a table, namely a two-column one), and probably some other types as well. In DITA, you can find something called “simple list”, which claims to limit what’s listed to one line per item, tops, without bullets or numbers, but to me that’s less about semantics and more about presentation.

So here’s a typical DITA list (HTML, DocBook and quite a few others look exactly like it, too):

<ul>
<li>Apples</li>
<li>Oranges</li>
<li>Bananas</li>
</ul>

There’s more to list semantics, though, at least in my mind. If you wanted to find a complete list in a document, you’d probably want to include its qualifying introduction (“Here’s the groceries you need to buy:”), and any and all information that goes between list items without being part of them but still belonging to the list as a whole. If your spouse is kind enough to subcategorise the grocery list to vegetables, fruit, dairy products and so on (I know I need the help), we’d have a multi-part list where the participating lists are part of a larger whole.

The introductory paragraph is where it gets tricky in DITA and similar structures. There are a LOT of block-level elements to choose from, but you cannot easily do a list that meets these requirements. This one, the preferred DITA way (at least if we choose to believe the examples in the spec), lacks a wrapper that identifies the list as one unit instead of a loose paragraph that happens to be followed by a list:

<p>The fruit we need for tonight:</p>
<ul>
<li>Apples</li>
<li>Oranges</li>
<li>Bananas</li>
</ul>
<p>And the vegetables for tomorrow:</p>
<ul>
<li>Cucumbers</li>
<li>Tomatoes</li>
</ul>

Of course, one could argue that our grocery list is really a section, but I would argue that the introductory paragraph is actually part of the list, but not necessarily a part of the whole section. What if I wanted to include images or perhaps a note to that section? Semantically, I can think of dozens of ways to reasonably expand the structure of such a surrounding section and still keep it on topic (that is, limiting it to subject matters concerning that central grocery list).

Keeping with DITA’s topic-based approach, we could certainly use a number of such sections and wrap the whole thing in a topic, but me, I think that’s overkill. All I want to do is include an introductory paragraph.

This, of course, is where some will argue that the introductory paragraph is really a heading. Definition lists in DITA and some other DTDs actually do have a heading for this very purpose, which to me hints that somebody did touch the subject at hand at some point, but then why do the “ordinary” lists without that heading? And of course, me, I think that introduction is not a heading at all, only a qualifier for the list.

Another option in DITA and others is to use the <p> element as a wrapper:

<p>The fruit we need for tonight:
<ul>
<li>Apples</li>
<li>Oranges</li>
<li>Bananas</li>
</ul>
And the vegetables for tomorrow:
<ul>
<li>Cucumbers</li>
<li>Tomatoes</li>
</ul>
</p>

This is perfectly valid, of course, but it ruins the intent of the <p> element and creates a very odd (and ugly) mixed content that would be difficult to process properly.

What I would like to see is more in the lines of this:

<ul>
<p>The fruit we need for tonight:</p>
<li>Apples</li>
<li>Oranges</li>
<li>Bananas</li>
<p>And the vegetables for tomorrow:</p>
<li>Cucumbers</li>
<li>Tomatoes</li>
</ul>

Now we have a single list (our grocery list) that includes the necessary introduction(s). Of course, it’s still somewhat ugly; I, for one, dislike the relative lack of list item structure–I’d much rather see an item modelled more properly, perhaps divided into paragraphs and other block-level content, where the concepts block level and inline remain properly separated.

DITA in Review

I’ve been thinking about DITA, partly because of the comments from Michael Priestley regarding my previous DITA post, but also because I recently had to prepare a quote for a prospective client.

On one hand, I still maintain that a generic structure can practically never be as immediately relevant to a client than a structure tailored for their needs. I’ve seen this happen many times in the past, having to compare various so-called industry standards with the actual needs of my clients. Structures have mapped poorly, which is to be expected, but the same has been true with meta-data which, in a way, is more surprising, considering that meta-data should be something the industry standards get right.

On the other hand, recently, after my latest DITA blog, a prospective client requested a quote for replacing their current CMS. They’ve been authoring topic-oriented pieces of information for online publishing, with the topics sometimes collected in larger PDFs printed out and placed in binders. What they wanted was better version handling, integration with PDM applications, and an environment that would better support the authoring of individual topics published in various contexts. There was very few obstacles in the way of company-specific structures or meta-data.

Individual, loose topics, published in various contexts and deliverables, mostly online, sometimes on paper but as collections in binders. Hmm… where have I heard this line of thinking before?

Knowing how several editors out there have feature-rich DITA support and are easily adaptable, the quote was quite easy to prepare. It’s certainly easier to offer a figure when many of the unknowns are already taken are of, and this one practically screamed DITA.

Maybe the RFQ was a practical joke from the DITA folks. You’d tell me, right?

DITA

For the last year or three, XML editor makers have been busy coding DITA customizations into their products. The latest editor to get DITA is Oxygen, my XML IDE of choice these days. It’s the latest fad, see, and there’s money to be made.

But I’m not convinced, and here’s why:

DITA claims to make life easier for users by splitting documents into smaller, reusable pieces, hinting that this is a fresh, new approach to documentation. It’s not, however; some of us have done this for years in our DTDs, long before XML was even thought of, simply because that’s one of the main points with structured information. It’s the sensible thing to do, a good reason to why structured information is useful in the first place.

Now, this is all good and well, but because DITA needs to appeal to a lot of users, it is a generic structure, and it’s big. Both of these things are unfortunate since bigger means more difficult to learn, both for users and developers, and generic means that to apply the structure to your specific needs, an abstraction (customization) level is needed.

Generic also means that any markup specific to one user’s needs will have to be added, which means more customization.

With DITA comes a package of stylesheets and utilities, also big and generic, hard to learn, and in need of customization, not only to add the user-specific requirements, but also to modify their look and feel. After all, you don’t really want to have your documents look like the next guy’s, do you?

See, what the DITA advocates are saying, basically, is that either you do want that, or you need to customize.

My view of document structures is just the opposite, really. I’d much rather go with writing a customer-specific DTD, if at all possible, just as I’d go for customer-specific stylesheets and other customizations and tweaks. In that way, I could make the structures, utilities, and stylesheets immediately relevant to the customer, thereby saving time spent trying to learn a generic structure and then trying to apply it to your needs.

That customer-specific DTD will practically always be smaller than any generic one; I know every single DTD I’ve ever created has been, including the package of DTDs I wrote for a large automobile manufacturer for all of their aftersales documentation. At the same time the DTD will most likely be far more relevant, far better fine-tuned, for the customer’s needs.

And yet, it would be just as easily customizable as DITA or some other “standards-based DTD“.

When I’m lecturing on XML and document structure management, I always stress that we use XML because we like to convert XML to other formats, not because we want it to remain the same. If some other company needs DITA documents from us, fine! I doubt it, but if the need arises, it’s easy, even trivial, to convert a customer-specific structure to a generic one.

See, DITA to me is just another DocBook. It’s a standard, true, but it’s just another standard among a thousand other standards. It’s open, also true, but so are a thousand others. And of course it claims to be easily customizable, but that’s obviously the case with those thousand other standards, too.

But it’s also big and generic and not very relevant as such to any specific requirement, not without an abstraction level or two.