On leaving things undocumented

I quote one of the Economist editors on their writing/linking policy: That’s the same that we do in the weekly as well — we’re not big on linking out. And it’s not because we’re luddites, or not because we don’t want to send traffic to other people. It’s that we don’t want to undermine the reassuring impression that if you want to understand Subject X, here’s an Economist article on it — read it and that’s what you need to know. And it’s not covered in links that invite you to go elsewhere. We’ll link to background, and we’ll link to things like white papers or scientific papers and stuff like that. The idea of a 600-word science story that explains a paper is that you only need to read the 600-word science story — you don’t actually have to fight your way through the paper. There is a distillation going on there. (Emphasis mine.)

The same applies to technical documentation: Stick to your subject and document that. Background information is often a digression. It adds bulk, it adds information that someone may appreciate, but quite often it adds no information that you want to convey.

How can you tell whether it's a digression in any single case? If you define the audience and goal of your text, then the goal typically answers that question. Either the text in question helps your specified audience towards the specified goal, or else it's a digression.

The digression may be something you should link to.

Writing class documentation

A mere ten years after I promised to write this: here's a post describing how to write decent class documentation, for programmers, not writers. This describes how I wrote class documentation for Qt.

This has four parts: Write a sentence-long blurb with the most important thing you can say about the class. Write a few more sentences to give a complete, but rough overview of what the class is (is, not does). Make a list of methods, member variables, enum values and other subordinates of the class, sort the list into aspects, then write about each aspect (but not each member). Finally, make zero or more examples. […More…]

Audience and goal

Written texts have two major invisible properties: audience and goal. I can't remember who taught me about that, but I taught it to my friend Abhijit Menon-Sen when we started working together, and the texts he and I have written together over the years always have a hidden comment describing the audience(s) and goal(s) for that text. That's why those texts are crisp and to the point.

Here's an example. Not by us together, this one is his alone.

Abhijit and his girlfriend Hassath have recently built a house, it's aaaaalmost done now, and a few weeks ago Abhijit wrote a blog posting about an electric power gadget they bought for the house. The audience for that posting consists of two groups of people: Friends who want to know how the house is coming along, and people who are searching for reviews of the gadget before possibly buying one themselves. The goal for the first group is to describe the power problems and how Abhijit and Hassath are coping, and for the second group, to tell them whether and how well that particular gadget helps with that kind of power problem.

Now please read about their amazingly unreliable power supply and consider how each sentence, each paragraph and each picture helps with either or both of those goals, in the eyes of those audiences. Does a sentence say something that both audiences already know, or does it tell either or both audiences something Abhijit wants to tell them? Does a sentence help with one goal but disturb the other? What does the photo achieve? Would mentioning the audiences or goals in the text help to achieve either of the goals, or would it distract or detract?

That posting may be stylistically vapid, but it achieves Abhijit's goals and that makes it good writing. The rest is a mere question of how good.

Now please start formulating an explicit audience and goal before you write your own email, documentation, almost anything. Help save the world from pointless blather and documentation that forgets the critical points.

Not about the iphone

A classic pundit failure: In 2006 Bill Ray wrote in The Register that the iphone would be a a complete failure. The article has been quoted as pundit failure ever since.

I'm not sure it actually was a failure. Perhaps the editor asked the author to provide a crowdpleaser: Give me lots of pageviews and facebook likes, make our readers' heads nod in agreement, it doesn't have to be factually right. If that's the case, then the article may have been a roaring success for all I know.

I'm going to pretend it wasn't like that, though, because it doesn't suit my purpose with this posting. This is about technical documentation, not crowdpleasing or pageviews. My purpose here is to illustrate a class of mistake in technical documentation. Bill Ray's article is a wonderfully vivid example.

The article has two main points. One of them (about the power of retailers over customers and manufacturers) is irrelevant to technical documentation, so I disregard it. Here's the core of the other point:

The clever design of the iPod stretched into the software — the clean and simple interface is indeed easy to use, and users seem very comfortable with iTunes on their PC. But creating a simple interface for a single function is one thing. Replicating that experience to manage all the functions of a mobile phone is another thing entirely. Mobile phones are not complex to use because of bad interface design, they are complex to use because they are complex devices with a myriad of features. […More…]

Progress in bug fixing

We fix the hardware bugs in the software and the software bugs in the documentation, a cynical developer once told me. As the world progresses, the food chain grows longer: ... and fix documentation bugs with a blog posting.

Parsing documentation to find bugs

This is mostly a footnote for my review of Knuth's Literate Programming book. Read that first.

Undocumented arguments often occur where an argument wasn't considered in the code, or where the design is wrong. 50-60% of undocumented arguments were also buggily handled at Trolltech.

That correlation depends on the documentation being written by hand and by programmers. Generating documentation like Ghostdoc does breaks the correlation completely, and starting with mostly-generated text like that from Eclipse also weakens it to uselessness. If you, as a programmer, forget something about the argument while implementing, but someone/something else writes the documentation, then there is little or no correlation between what you forget to implement and what Ghostdoc, Eclipse or a technical writer forgets to mention.

The same applies to undocumented public functions in a class: If there were any at Trolltech, those functions were usually not ready for production use. Scanning for undocumented functions was a quick way to find functions with other problems.

Undocumented nulls and other edge cases are frequently unhandled; at Trolltech I had something to scan for pointer arguments where the word null wasn't mentioned in the same sentence as the argument. It found treasure quite often, but the text analysis was too poor to use it all the time, for example because the null was mentioned in a following sentence. Warnings have to be good, and quite often may or may not be enough.

I've never had a tool that parses both the implementation and the documentation. I wish I had one. It could look for differences and use them. Here are two examples:

When a caller and a callee disagree, the documentation status can be used to point at the most likely guilty function. Suppose x() calls y() and doesn't catch an exception y() may throw. Is that best reported as a problem in x() or y()? That depends on whether the documentation for y() mentions the exception.

If a function takes an enum argument, and the set of enum values mentioned in the code and in the documentation differ, then something is almost always wrong. It may be in the documentation, but in my experience it's more often in the code and sometimes in both.

Literate Programming: the book

Donald Knuth's book Literate Programming is a collection of articles about what he did in the the TexBook and in various other programs. I read it around 1995, while working on qdoc, and thought it was terribly naïve. I reread it in portions this year (the same copy, which found its way to me from Trolltech's once-extensive library — better to me than wherever Nokia is going) and this time I wanted to write down my thoughts. I wish I'd had a blog in 1995.

The book is all about writing two things at the same time, in two separate languages, not connected, merely adjacent in a single file. The main point is that WEB is inherently bilingual, and that such a combination of languages proves to be much more powerful than either single language by itself. WEB does not make other languages obsolete; on the contrary, it enhances them. (p101) Yes, no, no. WEB does not enhance either of the two languages (relevantly, at least). The Τεχ source is just that, it doesn't receive anything from WEB. You cannot access the list of Pascal variables in the Τεχ part, […More…]

Man is the measure of all things

Le Corbusier said so. Noone seems to understand it.

A window sill is not 1m above the floor. A window is properly located so its centre is 1.0 eyeheights above the floor and its size is 1.0 visionwidths, because man is the measure of all things. A good architect resolves these difficult units appropriately to the building.

The same applies to doors, stair steps, GUI animation times and technical writing. The reader's comprehension is the measure, not logical structure or the writer's composition ideal.

Trolltech's documentation process

When I arrived at Trolltech, there wasn't much documentation for Qt. Some good intentions, some outdated Τεχ source, no plan for improvement. Having suffered from that (I was the world's first external Qt user and needed documentation) I set about writing a usable documentation tool and documentation. One of my jobs became Documentation Supremo. I wrote some/much documentation, wrote the tools, kept track of quality and gradually improved the way we produced our documentation.

I did not write all of the documentation, far from it. While I wrote much and looked at the rest, I could not have written all. It was essential that everyone who wrote code, also wrote documentation.

Our documentation process worked (we wrote documentation and it largely satisfied the readers), so in this post I attempt to document what we did and what made our process succeed. […More…]

FAQ handling

FAQs happen, and have to be handled. There are four ways. A sorted commented list (I promise this won't be a rant):

Do nothing. A valued approach, and there is much to say in its favour. For an opensource hacker who's basically writing code to scratch his own itch (using the male pronoun seems safe in this context) there's no intrinsic reason to care about FAQs at all. […More…]

Use the source wisely, Luke

No matter what the documentation says, the source code is the ultimate truth, the best and most definitive and up-to-date documentation you're likely to find. Sounds really good. But grand sentences often sound really good — better than reality even. So let's try checking that against the concrete code and documentation I personally have written. Have I written anything where the documentation describes the code better than the code itself does?

Here's what I could think of during the time it took to brew myself a pot of tea:

I have written code that was not supposed to be used, except by internal unit tests. For instance to get January 1, 1970 instead of the current time. Very useful for unit tests, but do you really want to depend on that kind of thing?

I have written documentation that described only what's common to the current and the the next major version (in Qt, I did that many, many times). Anyone who depended on the documented behaviour got a simple upgrade, anyone who read the source might get trouble, often […More…]

Comparing javadoc with qdoc and doxygen

Qdoc misses many features Javadoc has. This is intentional — javadoc has many naïve or harmful features. Here's a braindump of the features I dropped or decided against adding.

@author is one thing Qt had, but I dropped it due to two problems. First, it misled users who used it to answer the question who should I ask about this problem?. The problem was typically about some recent change to the file, and the original author of the file was the wrong person to ask. Second, there was some reluctance within the team about editing someone else's code, which delayed bugfixes. […More…]

Writing worthless documentation with zero effort

Those lucky enough to use Visual Studio need not work hard to document their code: Ghostdoc does it all automatically. No mental effort required. […More…]

The inverted pyramid

The inverted pyramid is how journalists are taught to write typical news stories. It puts the most newsworthy information at the top, and then the remaining information follows in order of importance, with the least important at the bottom. (The quote is from Chip Scanlan.)

This is the right way to write most API documentation. […More…]

Companion techniques

Another finding-my-thoughts posting. Documenting helps fix bugs early, but what are its companion techniques?

Polishing the code all the time. Make it look good. Beauty is truth, truth is beauty. […More…]

Reading someone's documentation

Written with doxygen. It's funny to see how early qdoc syntax is still there. Features I added because they seemed to make sense, then discarded later, when I saw they didn't carry their load.

And there they are, still in use. Maybe funny isn't the right word.

Udoc into clang

One of udoc's problems is that its error messages arrive towards the end of the build process. Often, when fifteen executables are being built, I wait for the interesting one to be finished and immediately switch to testing it. Later on udoc delivers some useful error messages, but I'm not looking any more. […More…]

Javadoc

Javadoc is built in to Java, but I think they botched it. It's clear that they didn't care deeply: JLS3 grammar doesn't mention javadoc at all, and the JLS doesn't specify it, hardly even mentions it… the word stepchild sounds more appropriate than does builtin, in my oh-so-humble opinion.

There are many things I don't like about the result, and few things I do like.

There's too much typing for not enough benefit. […More…]

Output formats for generated documentation

The four major output formats (long print, piecemeal print, ASCII on screen and hypertext) all have their fans, all have their uses, the ideal system would support all four, but it isn't possible to support all four really well.

Writing, designing or encoding for plain ASCII just isn't the same as writing for pages. One of the books I read ages ago had a very nice example, showing two pages of text vs. one screen of text. The pages allowed boxes that didn't intrude on reading the regular text, fine graphics, and easy overview of a lot of data. In all, around six times as much information in front of the reader's eyes. […More…]

Literate programming failed. Why?

Donald Knuth invented literate programming and published the TexBook as an example. The book is great, or so I've heard from many people who've read it. So why is literate programming is practically unused today, at least the kind Knuth invented? […More…]

Kinds of literate programming

The TeXbook employs something called literate programming: Knuth wrote code and text together, effectively writing a narrative about that code, with that code as part of the narrative.

Knuth could do that, he's a genius. He was able to write a sizable program practically without bugs, in a note-book. Mortals like myself could not. I'd have to go back and rewrite earlier bits, and before long the narrative would stink. […More…]

Writing (mostly) to write

Most people who talk about literate programming seem to care mostly about the output. The written documentation. I care more about the actual writing, and the result of the writing process.

When you write something, when you explain it, you gain a deeper insight yourself. That's a cliché, to be sure, but it can be leveraged to write better code. There are two parts to it: Helping yourself and having your tool help you. […More…]

API documentation using literate tools

API documentation is a particular subclass of literate programming. What makes it special?

First, its audience is diverse. Some readers know almost as much as the maintainers about the subject, others are rank beginners. Many know quite a bit about some parts of the subject and are almost ignorant of other parts. Some readers like to point and click, others prefer dead flat trees, others again prefer on-screen plain text such as man pages (I do, because I can type much faster than I can point and click). […More…]

udoc

Udoc is the name of my latest program to do literate programming, and I'm reusing it here, for writing about the subject. It is perhaps not a very good name, but I used it since the subject is such a many-faceted one:

API documentation is what I did at Trolltech. At least that's what I thought I was doing at the start. As I was developing the format and writing text, I found I could use the documentation for two other purposes:

Maintainer documentation is one half of what Abhijit and I do for Archiveopteryx.

Writing better code through writing documentation and thereby developing a better understanding.

Checking boxes is what I see much too often. There Shall Be Documentation, or I Will Document My Code; but the actual documentation is meaningless — doesn't answer any questions I have about the code or interface.

Literate programming tools can be used for all three. Some are better suited to one purpose, some to another. Disciplined usage can help reach a goal even if the program fails to help — I still think it's a failure, though.

The history of udoc

The origin of udoc goes a long way back, to when I still was a student at the University of Trondheim, the world's first and only Quasar Toolkit user, and about to start working at Trolltech, which at the time was called Quasar Technologies (Hi Haavard) and occupied a room and a half overlooking a busy street in a rather unfashionable part of Oslo.

I wasn't very happy with the Qt documentation, which was then written using LaTeX macros and already obsolete. I was also an opinionated asshole and far too sure of myself, and I'd just learned about Donald Knuth's literate programming techniques, but I hadn't read his book. Naturally I looked at the existing litprog tools (there were quite a few) before discarding them all to write something good. […More…]

On documentation and literate programming, etc.

Over the next many months, I'm going to write a long series of semirelated postings on source code documentation, covering the techniques I developed at Trolltech and extended for Archiveopteryx, what other people are doing and what I think of it all.

Update: Done, and below are links to the major postings.

What I write about is largely API documentation, although often for internal readership. Literate programming as conceived by Knuth and described in the book Literate Programming is closer to implementing a program and writing a user manual together. Knuth wrote something brilliant in one long sitting (OK, with pauses to sleep), my topic is writing a large number of small blocks that together describe an evolving software complex, and use tooling to combine human-written text with other input to produce optimal output. Most programmers aren't as clever as Knuth, myself included, and because of that I think literate programming isn't used any more.

That is, literate programming in Knuth's sense. Some people describe the kind of writing supported by javadoc, doxygen etc. as literate programming, because it does mix code and documentation. Knuth might disagree, but he doesn't have a unique right to define the term. I personally don't really care.

Writing isn't merely a way to produce words; the writing process is valuable in itself, and the toolchain can produce more than just text. In my opinion all tools used by software developers can and should produce diagnostics to help produce the software. A compiler shouldn't just produce .o files of today's source code, it should help the developers reach the goal of bug-free software faster. Ditto for documentation tools.

Documentation is related to FAQ lists and support questions and some people think the source is documentation, I'm not of that opinion.

Most documentation processors emit HTML. Other formats do exist, though, each with its own characteristics. Some emit garbage.

Finally, I have three more longish posts about the writing process, about how to write a single documentary description, how to write anything factual and how Trolltech's documentation process evolved and worked, a blog posting about how to write class documentation and two about tools I've written, about their history and evolution and comparing my tools to doxygen and javadoc.

Udoc: The Name

For a long time udoc was called qdoc. Because Trolltech might want to release its qdoc (abbreviated from qt doc tool for no particularly good reason), it seemed sensible to rename this one before release. A simple matter — just step along to the next letter, I thought. But then there turned out to be something called rdoc already. And two programs called sdoc. tdoc was taken too. A few quick searches showed that there are more than 26 programs like udoc.

You can understand why I considered naming it fdoc.

Too Much Documentation

This is a bit of a rant.

I claim that, for any single page in a documentation set, whether it's pages in a book, web pages on a documentation site or even unix man pages, one of these three cases holds: […More…]