Parsing documentation to find bugs

This is mostly a footnote for my review of Knuth's Literate Programming book. Read that first.

Undocumented arguments often occur where an argument wasn't considered in the code, or where the design is wrong. 50-60% of undocumented arguments were also buggily handled at Trolltech.

That correlation depends on the documentation being written by hand and by programmers. Generating documentation like Ghostdoc does breaks the correlation completely, and starting with mostly-generated text like that from Eclipse also weakens it to uselessness. If you, as a programmer, forget something about the argument while implementing, but someone/something else writes the documentation, then there is little or no correlation between what you forget to implement and what Ghostdoc, Eclipse or a technical writer forgets to mention.

The same applies to undocumented public functions in a class: If there were any at Trolltech, those functions were usually not ready for production use. Scanning for undocumented functions was a quick way to find functions with other problems.

Undocumented nulls and other edge cases are frequently unhandled; at Trolltech I had something to scan for pointer arguments where the word null wasn't mentioned in the same sentence as the argument. It found treasure quite often, but the text analysis was too poor to use it all the time, for example because the null was mentioned in a following sentence. Warnings have to be good, and quite often may or may not be enough.

I've never had a tool that parses both the implementation and the documentation. I wish I had one. It could look for differences and use them. Here are two examples:

When a caller and a callee disagree, the documentation status can be used to point at the most likely guilty function. Suppose x() calls y() and doesn't catch an exception y() may throw. Is that best reported as a problem in x() or y()? That depends on whether the documentation for y() mentions the exception.

If a function takes an enum argument, and the set of enum values mentioned in the code and in the documentation differ, then something is almost always wrong. It may be in the documentation, but in my experience it's more often in the code and sometimes in both.

Trolltech's documentation process

When I arrived at Trolltech, there wasn't much documentation for Qt. Some good intentions, some outdated Τεχ source, no plan for improvement. Having suffered from that (I was the world's first external Qt user and needed documentation) I set about writing a usable documentation tool and documentation. One of my jobs became Documentation Supremo. I wrote some/much documentation, wrote the tools, kept track of quality and gradually improved the way we produced our documentation.

I did not write all of the documentation, far from it. While I wrote much and looked at the rest, I could not have written all. It was essential that everyone who wrote code, also wrote documentation.

Our documentation process worked (we wrote documentation and it largely satisfied the readers), so in this post I attempt to document what we did and what made our process succeed. […More…]

Comparing javadoc with qdoc and doxygen

Qdoc misses many features Javadoc has. This is intentional — javadoc has many naïve or harmful features. Here's a braindump of the features I dropped or decided against adding.

@author is one thing Qt had, but I dropped it due to two problems. First, it misled users who used it to answer the question who should I ask about this problem?. The problem was typically about some recent change to the file, and the original author of the file was the wrong person to ask. Second, there was some reluctance within the team about editing someone else's code, which delayed bugfixes. […More…]

The history of udoc

The origin of udoc goes a long way back, to when I still was a student at the University of Trondheim, the world's first and only Quasar Toolkit user, and about to start working at Trolltech, which at the time was called Quasar Technologies (Hi Haavard) and occupied a room and a half overlooking a busy street in a rather unfashionable part of Oslo.

I wasn't very happy with the Qt documentation, which was then written using LaTeX macros and already obsolete. I was also an opinionated asshole and far too sure of myself, and I'd just learned about Donald Knuth's literate programming techniques, but I hadn't read his book. Naturally I looked at the existing litprog tools (there were quite a few) before discarding them all to write something good. […More…]

Udoc: The Name

For a long time udoc was called qdoc. Because Trolltech might want to release its qdoc (abbreviated from qt doc tool for no particularly good reason), it seemed sensible to rename this one before release. A simple matter — just step along to the next letter, I thought. But then there turned out to be something called rdoc already. And two programs called sdoc. tdoc was taken too. A few quick searches showed that there are more than 26 programs like udoc.

You can understand why I considered naming it fdoc.