At the conclusion of my previous article on working with the Claude AI, I said that I planned to continue the work of converting legacy content to DITA using AI to see if there was anything else I could learn from the experience. I have just finished posting a newly DITA-fied Model T car manual to GitHub, available for anyone to work with. And I certainly did learn a few more things about converting content using AI along the way. The conversion is essentially a “re-interpretation” of the original Model T manual by Claude—it is far from being an exact replica of the original manual. And that has a lot to do with the conversion process by an AI like Claude.
The Process
More source content was an old manual from Project Gutenberg for the original Model T car, dating to 1919 and long out-of-copyright. The original document was written as a series of frequently asked questions (FAQs), though sometimes oddly worded by our standards. While there are many straightforward questions like the following:
- How is the Car started?
- What causes “Knocking” in the Engine?
- How is the Front Axle removed?
there are also a bunch of the odder phrasings used, like:
- When the Magneto gets out of order—what?
- When the Valves and Push Rods are worn—what?
- If the Coil and Plug are right—what?
- How is a Weak Unit detected?
In each of these cases the dangling “what” at the end is shorthand for “what to do?” or “what should be done?” And in true narrative fashion for these old manuals, each FAQ question builds on the previous one, so the “weak unit” question relates to the answer to the previous question on “coil units”. In most cases it was easy to determine which DITA topic type to use for capturing the gist of the answer, though sometimes it made more sense to break a long answer into several topics, or to change the topic type when the conversions from Claude didn’t work out.
A typical prompt to Claude to create a topic was “Please convert the following to a DITA [specific type of] topic, including a short description” followed by a paste of the content from the original manual. In a few cases, I asked Claude to choose the appropriate DITA topic type for the pasted content, but the results were mixed, and I found it easier to guide the AI with topic typing. The result was 179 topics, plus bookmap, sub-maps, and associated images that are posted to GitHub as a complete manual.
Consistency is Not an AI Strength
A good example of the inconsistencies was the AI’s using random punctuation rules for things like lists or steps in a task; a period at the end of a point, or only at the end of a series of points, or none at all. I thought about making things consistent and effectively imposing a specific style for the document, but I thought it was more instructive (and admittedly less work), to keep the variations in style as an example of AI can and cannot do at the moment.
Claude would consistently use one style within a single session, but if I had to start a new session (due to prompt limits with the free version of Claude I was using) the previous context was lost. The contrast was most obvious in the “Summary of Engine Troubles and Their Causes” where the first series of topics were done in a complex, multi-faceted manner, and the rest—done in a new session, but using the same prompts—came up with a stripped-down version of the same type of material.
To be fair I think a lot of this could be overcome by crafting more precise prompts, or to have a way to customize the output from an AI so that it is aware of a house style it needs to follow.
While these inconsistencies in the conversion process were annoying, on the whole, the conversion went very well with fully understandable self-contained DITA topics. The inconsistencies I encountered have more to do with an “untuned” AI tackling a project piecemeal across multiple sessions. Ideally, an AI would be run in-house and would be asked to work across not just one converted document, but the corpus of a company’s technical documentation.
The AI Can Be Both a Technical Writer and a SME
One thing I discovered as I converted more content with Claude is that sometimes it would occasionally sneak in extra information that was not in the original content.
There was one example where I asked Claude to convert some content and the original content used some obviously dated car terminology, so the AI came back with converted text that mentioned that the process applied only to vintage vehicles.
In the end, I edited the topic to remove that wording, but it was interesting that Claude “knew” that the process applied to an antique vehicle.
Another case in point are the glossary topics, which are wholly made up by Claude. For each glossentry, I asked Claude to come up with a definition for a given term that was contemporary to the Model T car. Sure enough Claude stepped up and gave solid definitions for each glossentry term I asked it to produce, usually mentioning the Model T context. So Claude was fully capable of being both a technical writer as well as a subject matter expert (SME).
If an AI could be trained on the content already produced by a firm, it should be able to provide SME-like guidance for technical writers. Given how hard it can be to get confirmation from actual SMEs on content for review, having the ability to lean on an AI for some insights will be of real value to technical writers, though there is an obvious issue if AI is leaned on too much.
The Human Factor (aka DITAWriter)
While all of the topics were produced by Claude, I used Oxygen to validate the code, and more often than not tweaked things where the structure was invalid or when Claude invented new DITA elements. The more complex the topic type—particularly the troubleshooting and glossentry topic types—the more likely Claude was to create new DITA tags and invalid ways of structuring DITA content.
Claude could not “see” the images from the original HTML copy of the content, so I added all of the image-related code. And where a topic referred to an image in another topic, I set a relationship table link to it instead of using an xref, following best practices. And while Claude came up with the individual glossentry topic content, I set up the map and backmatter material, as well as inserting links to the glossentry topics in the body content. (See my previous article on how to set up a glossary in DITA, which is a good practical guide on what to do). Just about all links in the document are key-based, including a standard key definition map for all topics (keydef_map.map), a separate image store (image_store.map), and a glossary map (glossary_map.map). The document is designed to be a good example of how to put together a standard manual founded in DITA. I was hoping that there might be subsequent versions of the Model T manual to demonstrate versioning using conditions, but I could not find any. There is a good example of versioned manuals for the later Model A car from the early 1930s, but I will have to wait a few more years until it falls out of copyright. Feel free to download a copy of the files to play with and learn from.
AI-based Conversion is Here Now
This was my conclusion at the end of my last article on AI-based content conversion to DITA, and it still rings true to me. In a comment I made at the end of the previous article, I definitely think AI will give existing document conversion services a run for their money, though I wouldn’t be surprised if they are already on top (or ahead) of this development, and have refined their services to include customized AI-based conversion. And the way I converted the content was definitely not ideal, spread as it was over a couple of weeks’ worth of free sessions from Claude, rather than as a more cohesive single session that would have smoothed out most of the consistency problems I encountered.
While I could use an online AI to work on this long out-of-copyright manual, I believe most firms would insist on having an on-premise LLM or on a secured SaaS to ensure that their intellectual property does not accidentally leak. I don’t see an AI like Claude replacing technical writers anytime soon. Still, there is definitely a role for them for any firm that seeks further efficiencies in producing technical content.
Postscript: After writing this piece I watched a video presentation about the updates to the Oxygen AI Positron Assistant. It suggests that they may have solved the consistency issues I found when using Claude over multiple sessions to convert content. I have the latest version of Oxygen and a review of the AI Positron Assistant is overdue, so expect a future article on that topic.