Parsing documentation to find bugs

This is mostly a footnote for my review of Knuth's Literate Programming book. Read that first.

Undocumented arguments often occur where an argument wasn't considered in the code, or where the design is wrong. 50-60% of undocumented arguments were also buggily handled at Trolltech.

That correlation depends on the documentation being written by hand and by programmers. Generating documentation like Ghostdoc does breaks the correlation completely, and starting with mostly-generated text like that from Eclipse also weakens it to uselessness. If you, as a programmer, forget something about the argument while implementing, but someone/something else writes the documentation, then there is little or no correlation between what you forget to implement and what Ghostdoc, Eclipse or a technical writer forgets to mention.

The same applies to undocumented public functions in a class: If there were any at Trolltech, those functions were usually not ready for production use. Scanning for undocumented functions was a quick way to find functions with other problems.

Undocumented nulls and other edge cases are frequently unhandled; at Trolltech I had something to scan for pointer arguments where the word null wasn't mentioned in the same sentence as the argument. It found treasure quite often, but the text analysis was too poor to use it all the time, for example because the null was mentioned in a following sentence. Warnings have to be good, and quite often may or may not be enough.

I've never had a tool that parses both the implementation and the documentation. I wish I had one. It could look for differences and use them. Here are two examples:

When a caller and a callee disagree, the documentation status can be used to point at the most likely guilty function. Suppose x() calls y() and doesn't catch an exception y() may throw. Is that best reported as a problem in x() or y()? That depends on whether the documentation for y() mentions the exception.

If a function takes an enum argument, and the set of enum values mentioned in the code and in the documentation differ, then something is almost always wrong. It may be in the documentation, but in my experience it's more often in the code and sometimes in both.