Arnt Gulbrandsen
About meAbout this blog
2010-03-26

Comments on the LWN thread

LWN ran an article on Archiveopteryx. Some points.

It's BSD-licensed, not OSL-licensed. It was OSL-licensed until last year.

One commenter opined that we might not have tested big mailboxes. Well, we have. Not sure exactly how big. We've routinely tested up to a million, bigger occasionally. At a million it's quite simple: Most mail readers fall over. Mutt downloads a quarter-gigabyte at startup, several others download twenty megs every few minutes.

If even Thunderbird needs hours or even days to reindex a mailbox, then I think the mailbox is unusable, no matter how badly or how well the server does. That's why we don't bother to test ten-million-message mailboxes.

Searching deserves a post of its own. Later. Suffice to say that Archiveopteryx includes a clever query engine, but IMAP clients mostly don't search well, and those that try are limited by IMAP's unripe search facilities.

IMAP's search looked good to me when I first read the RFCs, but now I think the RFC is based on insufficient experience. (No, ESEARCH doesn't help with this. It's not meant to, it just provides select count(*)… and a few others, and opens a framework for other extensions.)

About mangling: When we canonicalise messages, we do not break valid messages. Usually we recover all meaning, and even when we can't it's not too bad.

A particularly nasty example: Suppose a message arrives claiming to be encoded using euc-2022-jp. RFC 1468 says start with ASCII, then send a control code to switch to double-byte kanji encoding, then another control code to switch to ASCII. In reality, some of those double-byte segments contain an odd number of bytes. (I've seen this happen when Japanese eudora and pine users quoted naïve in a reply to me.) In that case, Archiveopteryx fixes the situation by changing the undecodable character to U+FFFD (in my case naïvena�ve).

Arguably it might be better to keep the syntax error. My take is that I want to use my old email, so I need the search to be fast. Most mail (>99.99%) is either 822 compliant or can be made so. The remnant is small enough that I'd rather have fast search and occasionally see na�ve.

Finally, maintenance. I'll keep working on Archiveopteryx, doing what pleases me. What I want to do decides what I do, and if you send me a bug report I may well ask you to test the fix, rather than test it myself.