As a way to celebrate what was the 10th anniversary of the DITA standard in 2015, I did a series of interviews with people who were there at the very beginning. Two of these key people were (and are) Don Day and Michael Priestley. Formerly with IBM, Don Day is now a DITA and XML consultant with Learning by Wrote, based out of Austin, Texas. Don received official recognition for his contribution to reengineering IBM’s information assets, and is also been honoured as an OASIS Distinguished Contributor for his work leading the original OASIS Darwin Information Typing Architecture (DITA) Technical Committee. Michael Priestley is currently an Enterprise Content Technology Strategist with IBM, where he has worked for the past 22 years. Michael played a key role in helping to establish DITA and DITA-based processes at IBM, and for many has been IBM’s most public face relating to DITA, delivering significant publications and presentations in support of the standard. He is also the chair of the Lightweight DITA Committee at OASIS.
DITAWriter: Could you tell me about the path that would eventually lead to the creation of DITA? I understand that IBM has been using structured authoring for some time, so what were you using and what were the reasons for wanting to move to the topic-based architecture that would eventually become DITA?
Don Day: “You are in a maze of twisty little passages, all alike.” That phrase from the game “Colossal Cave Adventure” aptly describes IBM at the time the seeds of DITA were germinating. The colossal company was seemingly alike in name, but each departmental “little passage” within it was often unaware of other passages in the matrix. Michael Priestley and I supported two different passages, and we had occasionally butted heads over inconsistent processes and locally-developed tools of our respective passage. We were both adamant that our solutions be driven by the pure desire to serve our respective users. But even though IBM had an available markup and publishing technology, an SGML precursor originally called Information Structure Identification Language (ISIL), later productized as BookMaster), there were no standard tools to do anything other than conventional printing with the content. But IBM did have a vibrant tools culture that all departments could tap into for sharing programs, procedures, and knowledge. Michael and I were the tool gurus in our respective departments, and doing the best we could to meet very different requirements based loosely on the common BookMaster markup model. But since no standard parsers existed for the markup, most departmental tools served particular markup variants for particular delivery requirements, which impeded reuse of content for something new at the time: supporting applications that ran on more than one operating system and hardware base.
My journey took me into working with Eliot Kimber, Wayne Wohler, Simcha Gralla and others on an SGML upgrade to BookMaster that we called IBMIDDoc. By now, an SGML parser was available on at least one workstation platform, so the company could finally pump books into other formats with tools that were endorsed by a strategy organization that advised on common methods and tools across the company (then called Customer and Service Information, C&SI). This was better, but not perfect, particularly when the needs of the “online help” advocates and of “cross-platform books” advocates still led to continued one-off tool development.
You might guess where Michael and I fit in as community advocates. The introduction of the World Wide Web as a content delivery platform opened up a new frontier of variant output requirements since there were few standards for creating and organizing web content at the time. The sig line of one of my mentors said: “For a man whose only tool is a hammer, every problem looks remarkably like a nail”, and indeed IBM’s fast path to the web involved converting whole books into equivalent web deliverables and HTML help systems, again still not using fully-compatible approaches to markup for each kind of deliverable.
Michael Priestley: My path was a bit different from Don’s. I started at IBM as a student back in ’92 authoring OS/2 help, in a BookMaster relative called IPF created specifically for online help. What training I had was from a Bill Horton workshop on online help authoring and information typing.
Once I came on with IBM full-time, my first project was leading the creation of a single-sourced user guide that would be available in both printed and help forms. It was pulling together a variety of components some of which had been print only and some online only.
When Don’s team rolled out IBM’s SGML solution with IBMIDDoc, my area was one of the early adopters, but we bounced—we found we couldn’t get the online navigation and linking we needed. So we migrated from Bookmaster and IPF to IBMIDDoc SGML straight on to HTML.
At that point my team came up with the classic concept/task/reference typing. The original types were defined by three senior members on my team (I was a team lead at this point but relatively junior compared to them): Jamie Roberts, who also taught rhetoric and professional communication at York University; Laura Rintjema, a savvy and design-focused senior writer; and Dennis Bockus, an editor with a ton of industry experience beyond IBM. The story is they argued it down to the three core types over dinner and wrote it down on a napkin.
When the corporate team took a look at the next generation of tooling beyond SGML, I was chosen to represent the Toronto Lab, half of which were users of the IBMIDDoc SGML solution, the other half users of HTML. I definitely provided an outside perspective on the team. I had experience as an Information Architect for a new product, as well as experience coding with XML, but little to no experience with the existing standard SGML system.
I was an Information Architect for WebSphere Component Broker right before DITA—it was a plum assignment, architecting and writing the docs for a v1.0 product. That is where I learned XML, since the early versions of the product used it extensively. The Director of Information Development for the IBM Toronto Lab nominated me to work on the workgroup with Don and I joined at the start of 2000. After that I kept writing docs for smaller components of various IBM products developed at the lab, until I came over full time to work on DITA transform development.
I joined Don’s workgroup as an SME/user representative, but because of my product experience I was also quite familiar with XML. There were two main design directions in the workgroup: one option was to use the migration from SGML to XML as an excuse to clean up and simplify the SGML architecture, with a general-purpose markup language structured around nested titled containers. The other option was to come up with an entirely new standard based around the corporate information typing standards (concept/task/reference).
Although the concept/task/reference types were also standard across IBM, the standard was not consistently enforced, and many groups were still doing book-oriented content, simply republished to the web. There was a real tension between these two groups. We ended up pursuing both: a generic nested topic structure, and specific concept/task/reference structures. Don worked on the topic definition, and I worked on the concept/task/reference types.
One of my goals for the content types was to specialize the reference type to accommodate API documentation. I’d been the team lead for IBM’s Open Class Library docs, and I came up with the idea of specialization after looking at various options including architectural forms (an approach developed by Eliot Kimber). I modified the architectural forms idea to encode awareness of a type hierarchy (making the class values sequence show an inheritance hierarchy), use defaulted attributes rather than a separate system, and wrote up a set of rules that would ensure specialized types could be processed as instances of their ancestors.
As an aside: I attended Extreme Markup Languages around this time, which is when I first met Eliot in person. I was presenting on some transform work I’d done for my HTML doc team that ended up becoming DITA maps, but DITA wasn’t in the public eye at this point. I shared with Eliot what I wanted to do with specialization using napkin scribbles. I explained the whys and the hows (defaulted atts, sequenced values, etc.) and asked him for his opinion. He said “it’s a kludge. But it works.” That gave me a great boost of confidence to take it to the next stage.
The big breakthrough was when we realized that we could use specialization to reconcile the different content approaches—generic vs. information typed—by making concept/task/reference specializations of topic, instead of base types in their own right.
I wrote up the draft doctypes and specializations, and proof-of-concept XSLT. I did a screenshare demo to show Don (the workgroup lead) and Dave Schell (the lead for Corporate ID, and sponsor of the workgroup). Once it was proven to work, the whole workgroup focused on this approach, with the strong direction and support of Dave Schell. Don developed a complete HTML transform (single-pass in those days), and we wrote up a developerWorks article and attached the XSLT and DTDs as samples.
At that point we didn’t have maps, or domain specialization, or anyone using the architecture within IBM.
We started getting emails from people using it, including France Baril from IXIASOFT, and Sirpa Ruokangas from Nokia. We even met with the Nokia team a couple of times to make sure they could implement it, and be co-founders of the DITA Technical Committee once we contributed to OASIS. We already knew by this point that we didn’t want to keep DITA proprietary, the way we had with our IBMIDDoc SGML architecture.
So development of DITA continued within IBM, even as adoption began outside of the company. Susan Carpenter led the first pilot of DITA with IBM WebSphere Application Server docs. They didn’t have maps yet, so they were actually building out the navigation based on metadata in the topics, instead of the other way round.
We started a new workgroup to define maps – the major inputs were the toolkit transforms I’d been using with my HTML authors (that I’d presented at Extreme Markup Languages), plus a Lotus Notes system that abstracted links in help topics into side-files—that work was represented by John Hunt. Erik Hennum worked out how to do domain specialization, and that got added to the architecture too.
The next stage was full implementation of DITA in our IBM tooling, called the Information Developers Workbench (IDWB). Up until this point I’d been working half-time on the corporate team while continuing to write docs for products on the side. Now I got a full-year contract with the corporate team to work on the transforms, including multi-stage HTML transforms. Robert Anderson and I worked on extending and formalizing the transformation pipeline. Other members of the IDWB team worked on editor support, etc. We handled PDF output by writing a transform from DITA to IBMIDDoc so we could keep using its output process, but had an entirely new process for HTML web and help formats.
My next job after that was writing the user guide for DITA within IBM. I used it as a showcase for scenario-based content development, with the navigation and tutorials based around a hierarchical task analysis, and separate navigation schemes for PDF vs. web.
Finally we got to contribute the standard to OASIS. We tried to contribute the transforms as well (what became the DITA OT) but they weren’t prepared to own code so we started up a new SourceForge project. That took about a year of approvals, and ran in parallel with the DITA Technical Committee work to validate and standardize the specification. Much of it—especially the language reference—was based off that user guide I’d written for IBM, and there are still places where that language sticks around.
For most of our time working together at IBM, Don and I never met face to face. We’ve probably met more since he left than we ever did when we were at the same company. In the early days we had competing perspectives and represented very different communities—I think in retrospect DITA owes a lot to the creative tension between us.
[End of Part One]