Arnt Gulbrandsen
About meAbout this blog

Writing class documentation

A mere ten years after I promised to write this: here's a post describing how to write decent class documentation, for programmers, not writers. This describes how I wrote class documentation for Qt.

This has four parts: Write a sentence-long blurb with the most important thing you can say about the class. Write a few more sentences to give a complete, but rough overview of what the class is (is, not does). Make a list of methods, member variables, enum values and other subordinates of the class, sort the list into aspects, then write about each aspect (but not each member). Finally, make zero or more examples.

Good writers will do it differently, but the goal here is to write something good, not to be a good writer. The best is the enemy of the good.

You don't want to ship crap, and you also don't want to set the requirements so high that only some mythical other people could do the job. So let me digress first into what good means.

Good means that it helps the reader understand your class and how to use it for the reader's purpose. You don't know what that purpose is, but you can guess fairly well, because you are a software developer and familiar with the subject. If anyone can guess well, then it's you.

Sometimes the reader is someone on your own team, working on the same software. In that case the reader will be used to reading and editing the source code. In other cases you're supplying a black box that most readers will use via rubygems, maven/gradle, go get or similar repositories. Those readers never see the source code, they never see the unit tests. When you describe a class in five words or fifty, the description you choose often depends on who you are talking to. That is, the meaning of good depends on the audience.

Both variants of good are well within your capability as a software developer, though. You've done both things verbally, probably many, many times, and you succeeded at getting the point across. Doing it in writing isn't much more difficult.

Back to writing good class documentation:

Parts 1+2: The thirty-second description that'll tell people what this class really is. As for documenting functions, doing this helps you clarify your thinking. Writing a few sentences sentences to give an overview of what the class is improves your own overview.

The simple way to write this is to write a single sentence about the class. A single short sentence, it doesn't have to be complete, it should be simple, understandable and not wrong. Then you look at the sentence, think about what is important and left out, and write each important thing as a single sentence, or maybe as two. This writing style is called the inverted pyramid, and good journalists use it a great deal.

When you write this description, it's best to explain what the class is, rather than what it does. If your phrasing slides towards does, your result often becomes less valuable for the readers, who are focused on what their own code does. A function does, but a class is, and if you keep to is, your description tends to be readable for a wider range of readers with a wider range of use cases.

The detailed description of QWidget is a long example; read the first six paragraphs. The first sentence is the gist, after that each sentence or pair of sentence describes some additional important aspect. By the time you've read all six paragraphs you'll generally know whether QWidget is the class you need to inherit, or what the subclasses have in common.

QPushButton is a shorter example; only three lines of text are enough to describe what that is. The first sentence says what it is. A second, clarifying sentence introduces both of the names readers may have read before. It also suggests the kind of use case for which this is suited, because picking the wrong kind of button is such a common mistake. A third sentence names some good and common examples, in case the reader has used buttons but knows little terminology.

That's it. Three sentences, and the reader has a fair idea about whether this is the right class to use. QWidget needs a great deal more. Both things happen in practice. Most classes are well documented using a three-sentence description, some really need much more.

Of course, as you can see there are more sentences after the three. That's okay, it's not difficult to write, because those are sentences about the details, not about the class. I (or we?) wrote three sentences about this, two or three sentences about that, and both parts were simple because writing three sentences is much simpler than writing a long text.

Part 3: Filling in the right amount of details to tell people how to put this class to use.

The rest of the text about QPushButton is a sequence of short texts about some aspect of the class. We wrote those by making a list of enum values and member function, eliminating duplicates (e.g. accel()/ setAccel()) and sorting the result into a list of aspects. The enum ToggleType isn't an aspect of the class, for example, but the three values of that enum were used to make the list of aspects. Then we wrote approximately one paragraph per aspect, sometimes a little more.

The text looks a little blocky. A trained copy editor can easily see that each paragraph was written mostly on its own. On the other hand, a trained software developer can read this documentation and use the class well.

Part 4: Examples

After that, consider what examples the reader should see. Do not make the mistake we made in the early years of Trolltech, and write code that interests you. Our hello, world eventually morphed into an example of how to make colourful animated letters jump around the screen. Fine and impressive, but not helpful for a reader that looks for how to write a first hello, world.

What you should do is look at the documentation you've written, and if the class merits an example at all, you decide what example the reader would ask you for, and write that precise example.

QPushButton contains an excellent single-line example. (You may have noticed that even though it was written at the end, we moved it up to nearly the start of the text.)

QWidget is used by many example programs, so it contains a three-sentence description of one example program, explaining why that that particular example is good to start with.

QPushButton also contains automatically generated links to complete example programs that use it; our documentation tool (qdoc) generated those. In my opinion, each of those example programs should have a declared audience and goal. You should make clear (in a comment in each example program) both what this example program is supposed to teach and who should learn from it. We didn't at Trolltech, which harmed both the example programs and us as team.

After that: Editing helps. Editing, in this case, mostly means moving the blocks around. You can see that we moved the example in QPushButton near the top. That just looked right. Looking at QWidget, you can see that the events are grouped. I wrote the text for each group independently, then we looked at it and decided that the exposition worked best if the descriptions were ordered in that order.

Real editors do more, of course. This posting is about writing well enough to enable software developers to use a class well, not about writing better than that.

After that: Refining the documentation due to FAQs helps. When we answered support questions, when we answered FAQs, anything, we'd try to edit something a little to make that FAQ go away. That's why the QPushButton text points to the other kinds of buttons (people have terrible problems deciding whether a tool button, command button or radio button is the best for a given case, and so on), and it's why the QWidget text mentions top-level widgets so prominently.


This posting hardly mentions tools. Good documentation benefits from good tooling, but not all parts equally. The subject of this posting is text that has to come from a someone's mind. Other parts can come from analysing e.g. relationships in the source code.

Some tools aim at producing the best possible result from whatever effort you put in. That's not what I write about. My goal is strictly to produce usable results with as little human effort as possible.

The links above point to a very old version of Qt. Later versions are perhaps better documented, but for this posting I thought it suitable to use a version written by software developers rather than by technical writers. The users praised it. People like you can write documentation that users praise.

Audience and goal

Written texts have two major invisible properties: audience and goal. I can't remember who taught me about that, but I taught it to my friend Abhijit Menon-Sen when we started working together, and the texts he and I have written together over the years always have a hidden comment describing the audience(s) and goal(s) for that text. That's why those texts are crisp and to the point.

Here's an example. Not by us together, this one is his alone.

Abhijit and his girlfriend Hassath have recently built a house, it's aaaaalmost done now, and a few weeks ago Abhijit wrote a blog posting about an electric power gadget they bought for the house. The audience for that posting consists of two groups of people: Friends who want to know how the house is coming along, and people who are searching for reviews of the gadget before possibly buying one themselves. The goal for the first group is to describe the power problems and how Abhijit and Hassath are coping, and for the second group, to tell them whether and how well that particular gadget helps with that kind of power problem.

Now please read about their amazingly unreliable power supply and consider how each sentence, each paragraph and each picture helps with either or both of those goals, in the eyes of those audiences. Does a sentence say something that both audiences already know, or does it tell either or both audiences something Abhijit wants to tell them? Does a sentence help with one goal but disturb the other? What does the photo achieve? Would mentioning the audiences or goals in the text help to achieve either of the goals, or would it distract or detract?

That posting may be stylistically vapid, but it achieves Abhijit's goals and that makes it good writing. The rest is a mere question of how good.

Now please start formulating an explicit audience and goal before you write your own email, documentation, almost anything. Help save the world from pointless blather and documentation that forgets the critical points.

Not about the iphone

A classic pundit failure: In 2006 Bill Ray wrote in The Register that the iphone would be a a complete failure. The article has been quoted as pundit failure ever since.

I'm not sure it actually was a failure. Perhaps the editor asked the author to provide a crowdpleaser: Give me lots of pageviews and facebook likes, make our readers' heads nod in agreement, it doesn't have to be factually right. If that's the case, then the article may have been a roaring success for all I know.

I'm going to pretend it wasn't like that, though, because it doesn't suit my purpose with this posting. This is about technical documentation, not crowdpleasing or pageviews. My purpose here is to illustrate a class of mistake in technical documentation. Bill Ray's article is a wonderfully vivid example.

The article has two main points. One of them (about the power of retailers over customers and manufacturers) is irrelevant to technical documentation, so I disregard it. Here's the core of the other point:

The clever design of the iPod stretched into the software — the clean and simple interface is indeed easy to use, and users seem very comfortable with iTunes on their PC. But creating a simple interface for a single function is one thing. Replicating that experience to manage all the functions of a mobile phone is another thing entirely. Mobile phones are not complex to use because of bad interface design, they are complex to use because they are complex devices with a myriad of features. […More…]

Parsing documentation to find bugs

This is mostly a footnote for my review of Knuth's Literate Programming book. Read that first.

Undocumented arguments often occur where an argument wasn't considered in the code, or where the design is wrong. 50-60% of undocumented arguments were also buggily handled at Trolltech.

That correlation depends on the documentation being written by hand and by programmers. Generating documentation like Ghostdoc does breaks the correlation completely, and starting with mostly-generated text like that from Eclipse also weakens it to uselessness. If you, as a programmer, forget something about the argument while implementing, but someone/something else writes the documentation, then there is little or no correlation between what you forget to implement and what Ghostdoc, Eclipse or a technical writer forgets to mention.

The same applies to undocumented public functions in a class: If there were any at Trolltech, those functions were usually not ready for production use. Scanning for undocumented functions was a quick way to find functions with other problems.

Undocumented nulls and other edge cases are frequently unhandled; at Trolltech I had something to scan for pointer arguments where the word null wasn't mentioned in the same sentence as the argument. It found treasure quite often, but the text analysis was too poor to use it all the time, for example because the null was mentioned in a following sentence. Warnings have to be good, and quite often may or may not be enough.

I've never had a tool that parses both the implementation and the documentation. I wish I had one. It could look for differences and use them. Here are two examples:

When a caller and a callee disagree, the documentation status can be used to point at the most likely guilty function. Suppose x() calls y() and doesn't catch an exception y() may throw. Is that best reported as a problem in x() or y()? That depends on whether the documentation for y() mentions the exception.

If a function takes an enum argument, and the set of enum values mentioned in the code and in the documentation differ, then something is almost always wrong. It may be in the documentation, but in my experience it's more often in the code and sometimes in both.

Literate Programming: the book

Donald Knuth's book Literate Programming is a collection of articles about what he did in the the TexBook and in various other programs. I read it around 1995, while working on qdoc, and thought it was terribly naïve. I reread it in portions this year (the same copy, which found its way to me from Trolltech's once-extensive library — better to me than wherever Nokia is going) and this time I wanted to write down my thoughts. I wish I'd had a blog in 1995.

The book is all about writing two things at the same time, in two separate languages, not connected, merely adjacent in a single file. The main point is that WEB is inherently bilingual, and that such a combination of languages proves to be much more powerful than either single language by itself. WEB does not make other languages obsolete; on the contrary, it enhances them. (p101) Yes, no, no. WEB does not enhance either of the two languages (relevantly, at least). The Τεχ source is just that, it doesn't receive anything from WEB. You cannot access the list of Pascal variables in the Τεχ part, […More…]

Man is the measure of all things

Le Corbusier said so. Noone seems to understand it.

A window sill is not 1m above the floor. A window is properly located so its centre is 1.0 eyeheights above the floor and its size is 1.0 visionwidths, because man is the measure of all things. A good architect resolves these difficult units appropriately to the building.

The same applies to doors, stair steps, GUI animation times and technical writing. The reader's comprehension is the measure, not logical structure or the writer's composition ideal.

Trolltech's documentation process

When I arrived at Trolltech, there wasn't much documentation for Qt. Some good intentions, some outdated Τεχ source, no plan for improvement. Having suffered from that (I was the world's first external Qt user and needed documentation) I set about writing a usable documentation tool and documentation. One of my jobs became Documentation Supremo. I wrote some/much documentation, wrote the tools, kept track of quality and gradually improved the way we produced our documentation.

I did not write all of the documentation, far from it. While I wrote much and looked at the rest, I could not have written all. It was essential that everyone who wrote code, also wrote documentation.

Our documentation process worked (we wrote documentation and it largely satisfied the readers), so in this post I attempt to document what we did and what made our process succeed. […More…]

FAQ handling

FAQs happen, and have to be handled. There are four ways. A sorted commented list (I promise this won't be a rant):

Do nothing. A valued approach, and there is much to say in its favour. For an opensource hacker who's basically writing code to scratch his own itch (using the male pronoun seems safe in this context) there's no intrinsic reason to care about FAQs at all. […More…]

Use the source wisely, Luke

No matter what the documentation says, the source code is the ultimate truth, the best and most definitive and up-to-date documentation you're likely to find. Sounds really good. But grand sentences often sound really good — better than reality even. So let's try checking that against the concrete code and documentation I personally have written. Have I written anything where the documentation describes the code better than the code itself does?

Here's what I could think of during the time it took to brew myself a pot of tea:

I have written code that was not supposed to be used, except by internal unit tests. For instance to get January 1, 1970 instead of the current time. Very useful for unit tests, but do you really want to depend on that kind of thing?

I have written documentation that described only what's common to the current and the the next major version (in Qt, I did that many, many times). Anyone who depended on the documented behaviour got a simple upgrade, anyone who read the source might get trouble, often […More…]

Comparing javadoc with qdoc and doxygen

Qdoc misses many features Javadoc has. This is intentional — javadoc has many naïve or harmful features. Here's a braindump of the features I dropped or decided against adding.

@author is one thing Qt had, but I dropped it due to two problems. First, it misled users who used it to answer the question who should I ask about this problem?. The problem was typically about some recent change to the file, and the original author of the file was the wrong person to ask. Second, there was some reluctance within the team about editing someone else's code, which delayed bugfixes. […More…]

The inverted pyramid

The inverted pyramid is how journalists are taught to write typical news stories. It puts the most newsworthy information at the top, and then the remaining information follows in order of importance, with the least important at the bottom. (The quote is from Chip Scanlan.)

This is the right way to write most API documentation. […More…]

Reading someone's documentation

Written with doxygen. It's funny to see how early qdoc syntax is still there. Features I added because they seemed to make sense, then discarded later, when I saw they didn't carry their load.

And there they are, still in use. Maybe funny isn't the right word.

Udoc into clang

One of udoc's problems is that its error messages arrive towards the end of the build process. Often, when fifteen executables are being built, I wait for the interesting one to be finished and immediately switch to testing it. Later on udoc delivers some useful error messages, but I'm not looking any more. […More…]

Javadoc

Javadoc is built in to Java, but I think they botched it. It's clear that they didn't care deeply: JLS3 grammar doesn't mention javadoc at all, and the JLS doesn't specify it, hardly even mentions it… the word stepchild sounds more appropriate than does builtin, in my oh-so-humble opinion.

There are many things I don't like about the result, and few things I do like.

There's too much typing for not enough benefit. […More…]

Output formats for generated documentation

The four major output formats (long print, piecemeal print, ASCII on screen and hypertext) all have their fans, all have their uses, the ideal system would support all four, but it isn't possible to support all four really well.

Writing, designing or encoding for plain ASCII just isn't the same as writing for pages. One of the books I read ages ago had a very nice example, showing two pages of text vs. one screen of text. The pages allowed boxes that didn't intrude on reading the regular text, fine graphics, and easy overview of a lot of data. In all, around six times as much information in front of the reader's eyes. […More…]

Literate programming failed. Why?

Donald Knuth invented literate programming and published the TexBook as an example. The book is great, or so I've heard from many people who've read it. So why is literate programming is practically unused today, at least the kind Knuth invented? […More…]

Kinds of literate programming

The TeXbook employs something called literate programming: Knuth wrote code and text together, effectively writing a narrative about that code, with that code as part of the narrative.

Knuth could do that, he's a genius. He was able to write a sizable program practically without bugs, in a note-book. Mortals like myself could not. I'd have to go back and rewrite earlier bits, and before long the narrative would stink. […More…]

Writing (mostly) to write

Most people who talk about literate programming seem to care mostly about the output. The written documentation. I care more about the actual writing, and the result of the writing process.

When you write something, when you explain it, you gain a deeper insight yourself. That's a cliché, to be sure, but it can be leveraged to write better code. There are two parts to it: Helping yourself and having your tool help you. […More…]

API documentation using literate tools

API documentation is a particular subclass of literate programming. What makes it special?

First, its audience is diverse. Some readers know almost as much as the maintainers about the subject, others are rank beginners. Many know quite a bit about some parts of the subject and are almost ignorant of other parts. Some readers like to point and click, others prefer dead flat trees, others again prefer on-screen plain text such as man pages (I do, because I can type much faster than I can point and click). […More…]

udoc

Udoc is the name of my latest program to do literate programming, and I'm reusing it here, for writing about the subject. It is perhaps not a very good name, but I used it since the subject is such a many-faceted one:

API documentation is what I did at Trolltech. At least that's what I thought I was doing at the start. As I was developing the format and writing text, I found I could use the documentation for two other purposes:

Maintainer documentation is one half of what Abhijit and I do for Archiveopteryx.

Writing better code through writing documentation and thereby developing a better understanding.

Checking boxes is what I see much too often. There Shall Be Documentation, or I Will Document My Code; but the actual documentation is meaningless — doesn't answer any questions I have about the code or interface.

Literate programming tools can be used for all three. Some are better suited to one purpose, some to another. Disciplined usage can help reach a goal even if the program fails to help — I still think it's a failure, though.

The history of udoc

The origin of udoc goes a long way back, to when I still was a student at the University of Trondheim, the world's first and only Quasar Toolkit user, and about to start working at Trolltech, which at the time was called Quasar Technologies (Hi Haavard) and occupied a room and a half overlooking a busy street in a rather unfashionable part of Oslo.

I wasn't very happy with the Qt documentation, which was then written using LaTeX macros and already obsolete. I was also an opinionated asshole and far too sure of myself, and I'd just learned about Donald Knuth's literate programming techniques, but I hadn't read his book. Naturally I looked at the existing litprog tools (there were quite a few) before discarding them all to write something good. […More…]

Arnt rants and rambles on literate programming etc.

Over the next many months, I'm going to write a long series of semirelated postings on source code documentation, covering the techniques I developed at Trolltech and extended for Archiveopteryx, what other people are doing and what I think of it all.

Update: Done, and below are links to the major postings.

What I write about is largely API documentation, although often for internal readership. Literate programming as conceived by Knuth and described in the book Literate Programming is closer to implementing a program and writing a user manual together. Knuth wrote something brilliant in one long sitting (OK, with pauses to sleep), my topic is writing a large number of small blocks that together describe an evolving software complex, and use tooling to combine human-written text with other input to produce optimal output. Most programmers aren't as clever as Knuth, myself included, and because of that I think literate programming isn't used any more (ie. literate programming in Knuth's sense).

Writing isn't merely a way to produce words; the the writing process is valuable in itself, and the toolchain can produce more than just text. In my opinion all tools used by software developers can and should produce diagnostics to help produce the software. A compiler shouldn't just produce .o files of today's source code, it should help the developers reach the goal of bug-free software faster. Ditto for documentation tools.

Most documentation processors emit HTML. Other formats do exist, though, each with its own characteristics. Some emit garbage.

Finally, I have two more longish posts about the writing process, one about how to write a single documentary description and one about how Trolltech's documentation process evolved and worked, and two about the tools I've written, one about their history and evolution and one that compares my tools to doxygen and javadoc.

Udoc: The Name

For a long time udoc was called qdoc. Because Trolltech might want to release its qdoc (abbreviated from qt doc tool for no particularly good reason), it seemed sensible to rename this one before release. A simple matter — just step along to the next letter, I thought. But then there turned out to be something called rdoc already. And two programs called sdoc. tdoc was taken too. A few quick searches showed that there are more than 26 programs like udoc.

You can understand why I considered naming it fdoc.

Too Much Documentation

This is a bit of a rant.

I claim that, for any single page in a documentation set, whether it's pages in a book, web pages on a documentation site or even unix man pages, one of these three cases holds: […More…]