Arnt Gulbrandsen
About meAbout this blog

Output formats for generated documentation

The four major output formats (long print, piecemeal print, ASCII on screen and hypertext) all have their fans, all have their uses, the ideal system would support all four, but it isn't possible to support all four really well.

Writing, designing or encoding for plain ASCII just isn't the same as writing for pages. One of the books I read ages ago had a very nice example, showing two pages of text vs. one screen of text. The pages allowed boxes that didn't intrude on reading the regular text, fine graphics, and easy overview of a lot of data. In all, around six times as much information in front of the reader's eyes.

But of course it isn't that simple. A reader of plain text can grep, for example, and a reader of hypertext can follow links. Following a link may not be as convenient as glancing over at a figure on the facing page, but it can also be better, depending on the contents of the text linked from and that linked to.

No single format suits all readers all of the time, and no single format suits all of the inputs all of the time.

For documentation generators, the challenge is to produce output that uses as much of each format's strengths as possible, and compensates for its weaknesses.

Some think that for writers, the challenge is to write in a way that uses the format well. I think that for writers, the challenge is to write at all. The best is the enemy of the good. I don't worry about a text is short enough for on-scren use or long enough to really exploit paper, I worry whether it's written at all, and if written, whether it's incomplete and/or obsolete.

Long printed documentation

An underused format at the moment.

Its primary advantage is that there's lots of space. If you can put two A4 pages in front of the reader's eyes, you can show an enormous amount of information, and you can make hundreds of two-page spreads.

Balancing that are the problems of linking well, of finding a sequence, and of making a good index. In HTML you can just make a link somewhere and readers will follow it or not (it may be a bad link but people are used to that and generally don't complain). Putting twelve instances of see page 314 into a single paragraph of text is less popular. Also, the pages should generally be readable when sorted by page number. Page 1 first, then page 2, then page 3, … page 448. Making a readable overall order can be very difficult for generated source code documentation. Last but not least, making a good index is something I haven't even tried to do. I have no doubt that an automatic index generator can work much better than the ones I've seen, but can it be done well? I don't know.

On papers, some kinds of links can be turned into see page 128, but others are better handled by turning them into a footnote or see inset box. Heuristics are needed.

Several companies (e.g. Books on Demand and Vervante) will print and deliver such documentation nicely bound.

Books generally use both sides of the page and are bound, so the unit is fairly inflexible. The user can see two pages at once very easily, but four not at all.

Piecemeal printout

Similar to book format in many ways, except that you can spend even more space. A feature which adds one page to a four-page printout offends noone; if the same feature turns a 400-page book into a 500-page one users may be less happy.

Links have to be handled with care. Links to the greater web can be handled as footnotes, links to other parts of the documentation can sometimes be handled as inset boxes, but often not. In many cases, the best that can be done is to indicate that the link text is special, for instance by using a special font for class or file names, or naming the target parenthetically if the link text doesn't. Of course, too many parens degrade readability.

It can be difficult to know what should be printed and what not. To take the inset box as example: If a particular box would be included in most documentation units, and a reader print four units, will he/she get four identical inset boxes?

Piecemeal printouts will generally be single-sided and not bound, so two-page spreads are less likely, but if the reader can spread lots of pages out to cover the whole desk.

ASCII on screen

My favourite format often. It's so very quick to access man pages if you know the page name.

I like it because I can pull up the man page and issue a search command in less than a second. It's very quick to move from want to read.

ASCII text is also eminently easy to cut and paste, more so than even HTML. (Try cutting and pasting the previous paragaph. Were the quotation marks preserved or lost?)

The disadvantages of ASCII are that almost no typography or graphics features can be used. No italics, no inset graphs. And no links.

The lack of graphics sounds worse than it is. Generated source code documentation contains very few graphs or images, often exactly zero. Typography is also less important than it sounds; it may be nice to have a monospaced font for code and a larger font size for headlines, but it's not exactly vital.

Web page or other hypertext

Links. Links are good.

Unfortunately the web supports only a single kind of link, unless you use javasceript or some textual convention, and documentation often contains several kinds: References to inset tables, graphics or boxes, references to examples for the code being documented, references to APIs used in an example, references to other related APIs, and links to the wider web. It can be difficult to insert the right kinds of links, and mark them well, without making the output too link-laden.

Javascript activity may be supported. Or may not. Windows help and other closed hypertext system do not support javascript (I assume Microsoft supports ActiveScript or DCOM or something), and people do print web pages.

Hypertext documentation is unordered, which can be both a liberation and a problem. Readers are free to click their way towards the information they seek, which is good, but that also means they may not have read the page you assume they have, and may not know what you think they know.

If the hypertext output is HTML, and has stable links, then it's possible to link into the documentation from the wider web. Useful for FAQs, HOWTOs and those evil knowledge bases, useful for search engines and their users.

<a href="http://…/…/Carrot.html#enable"> Note that #enable. Using #324AB4452 isn't even nearly as good.

If the hypertext is HTML with human/short URLs the URLs can be used in email, IM messages or be printed in magazines.

And don't forget, HTML documentation is a buzzword. Supporting it gives buzzword karma, and we all need buzzword karma.