UIDONLY implementation notes

I implemented the IMAP UIDONLY extension in Archiveopteryx a while ago. A couple of notes:

It was really simple; two lines of code changed to six lines starting at line 1154, plus a few lines to announce the extension etc.

It didn't bring any any performance improvement, because generating the EXISTS response is mandatory. With UIDONLY Archiveopteryx counts messages in order to report how many there are in an EXISTS response. Without it, Archiveopteryx makes a list of messages. The loop is the same.

It ought to improve performance for the case where a connection opens a large mailbox and another connection already has opened that. This seems rare with Archiveopteryx. It's more important than it seems though: Opening really large mailboxes can take many seconds. I decided to leave the code in.

The one-minute guide to implementing unicode email addresses

The unicode email address extensions are pleasantly simple to implement. Here is an overview of the RFCs and some notes I made while doing my first implementations; this posting is a very brief description of the protocol and format extensions involved. Despite its brevity it's nearly complete, because these extensions are so simple.

Mail message format: Using UTF8 everywhere is now permitted. Instead of using RFC2047 encoding, quoted-printable and more, messages can use UTF8 everywhere.

To: Jøran Øygårdvær <jøran@blåbærsyltetøy.gulbrandsen.priv.no>
Subject: Høy på pæra
Content-Type: text/plain; charset=utf8

Gørrlei av eksempler.

No encoding is necessary anywhere. Encoding is permitted but not necessary. The message above lacks From and Date, apart from that it's correct.

Sending mail using SMTP: The server advertises the SMTPUTF8 extension, the MAIL FROM command includes the argument SMTPUTF8, and the email addresses can then use UTF8.

$ telnet mx.example.com 25
Trying 2001:6d8::4269…
Connected to mx.example.com […More…]

Use UTF8 or Punycode for email addresses?

Unicode addresses in email, such as مثال@مثال.السعودية, can be written using either Punycode or UTF8. (Or, if you're feeling inventive, in another manner you invent.) Which is best?

UTF8 looks like this: From: Arabic Example <مثال@مثال.السعودية>, punycode might look like this (if it were legal, see below): From: Arabic Example <xn--mgbh0fb@xn--mgbh0fb.xn--mgberp4a5d4ar>

The answer follows from two of the design goals for the unicode email extensions:

Allow UTF8 everywhere
Extend email, don't restrict it

RFC 821 and its successors do not contain any rules such as you MUST NOT put the letter n next to an x, so Punycode is allowed. EAI allows Punycode by virtue of not forbidding what was previously allowed. But the right way is to use UTF8 everywhere. Use UTF8 in the subject field, in the body text, in the address… everywhere! That's allowed, it's a design goal, and it's better than Punycode for four reasons.

First, it's simpler than using Punycode in addresses, 2047 encoding in the subject text and qp/b64 encoding in the body text.

Second, it's very, very readable. A surprising amount of legacy software does the right thing if you send it UTF8, and that goes for humans who read email source too.

Third, Punycode's interpretation is only specified for domains, and if rumour is to be believed, people are using two incompatible encodings for the localpart. (In the example above, the second and third instances of xn-- are specified, but the first is not.) You're permitted to send a punycoded localpart to anyone, but the recipient is not required to interpret it in the way you intend and most do not.

Again, nothing in the standards requires the receiving software to understand what you mean with <xn--mgbh0fb@… The punycode example above will only work by luck.

Fourth, sending Punycode habituates users to accept random hex blobs in addresses. A phisher's dream.

So use UTF8 everywhere in the message. Mapping to Punycode is necessary when doing the MX lookup in order to transmit the message, but only then.

Implementation notes about unicode mail

I've implemented unicode mail three times now; in Postfix (paid for by CNNIC and not yet integrated), in aox and lastly in an old mail reader I'm porting from the Zaurus PDA to Android (unreleased as yet, send me mail if you'd like beta access). This is mostly a random collection of notes and remarks I collected while writing the code.

The specification was produced by an IETF working group called EAI (short for email address internationalisation). The WG produced two generations of RFCs. First, an experimental series which I ignore, then a revised, simplified and improved series. This covers the second generation, which takes the general position that unicode mail is only sent to recipients who understand it. There is no conversion during transport, and (almost) no fallback to ASCII.

RFC 6530 is an overview/introduction. It points to the other documents, and has some extra text. Worth reading.

6531 describes how unicode addresses are used with SMTP: MAIL FROM, RCPT TO and VRFY accept UTF8 addresses, and there's a safeguard to provoke a syntax error in case a unicode message body would otherwise reach someone who cannot accept it. […More…]

Quantifying network efficiency

There is only one way to send email with SMTP: Connect, send EHLO, MAIL FROM, RCPT TO, BODY and the message. There are variations, but they're small. Most protocols offer implementers much more choice, many even demand much more.

This post is about a way to quantify the efficiency of the choices an implementer makes. It's a bit extremist, because extremism is simple and requires little thought: Bytes transmitted are defined to be useful if they're displayed on the user's display before the use case ends, and to be waste in all other cases.

The wasted bytes may be avoidable waste, unavoidable waste, Schrödinger's waste or even useful waste — it doesn't matter. This method applies a simple and unfair definition. Because the test is simple you can apply it quickly, and usually you learn something from doing the analysis.

The method leads to three numbers and hopefully nonzero insight: Percentage of bytes used to display, number of bytes downloaded per character displayed, and finally number of bytes used per character displayed. You get the insight not from the three numbers, but rather as a side effect of computing the three numbers.

I'll elaborate with two longish examples. […More…]

Test messages for unicode mail addresses (EAI)

EAI is a set of RFCs to enable unicode email addresses. jøran@example.com and even jøran@blåbærsyltetøy.no are syntactically valid email addresses. There are RFCs to extend the email message syntax, to transmit these messages via SMTP, access them via POP and IMAP, and to provide read access by unextended IMAP/POP clients.

I wrote a set of test messages for EAI this morning and put them on github. Feel free to send me extensions and corrections.

Email address internationalisation

EAI defines a set of RFCs to provide non-ASCII email addresses. pål@eksempel.no. I looked at them with a view to implementing that in Archiveopteryx.

The good news: It's simple and sane.

The bad news: I can tell it's possible to spend a lot of time arguing about minor side issues.

IMAP MOVE is under way

A single-document working group has been formed, and I have submitted the document for publication. It's not visible yet — wheels have to turn in the IETF secretariat first.

I hope this will be a peaceful process resulting in a readable RFC and wide implementation. So far it looks good, no flames and about nine servers implement it.

On good and bad RFCs

The worst of the nine RFCs I have written is doubtlessly 5465, IMAP NOTIFY, which should have been good but is a disaster. Its main characteristics are that it's complex (both in terms of number of rules in the RFC and the number of features needed in a server), that it's much more complex than the first of its input documents, and that noone implements it.

My best may be 4978, IMAP COMPRESS=DEFLATE, which is much shorter than 5465, roughly as complex as its first draft version was, contains an informative section with implementation advice, and is widely implemented. […More…]

IMAP, aox, 3G

A little bit of 3G first: A 3G connection is in one of several modes, ranging from PCH (which uses hardly any power and can't transfer payload data) to DCH (which uses much power and is used for bulk transfers).

The way Archiveopteryx handles IMAP, POP and SMTP is very battery-friendly. […More…]

Bless SMTP pipelining

I am sitting in Madikeri, Karnataka, India, a nice small town in the middle of picturesque forests and hills. Regrettably, the beauty of the scenery is not matched by the quality of the area's GPRS coverage. […More…]

Comments on the LWN thread

LWN ran an article on Archiveopteryx. Some points.

It's BSD-licensed, not OSL-licensed. It was OSL-licensed until last year.

One commenter opined that we might not have tested big mailboxes. Well, we have. Not sure exactly how big. We've routinely tested up to a million, bigger occasionally. At a million it's quite simple: Most mail readers fall over. […More…]

3.1.3 done

Archiveopteryx 3.1.3 is done. Finally the unending series of bugs stopped disturbing the server, and the server started running reliably, and recover from problems the way it's meant to. […More…]