Arnt Gulbrandsen
About meAbout this blog

Three programs, one feature

It's not something one does often, but I've implemented the same feature in three different programs. Not very different, all are written in the same programming language for the same platform, and all are servers.

Same platform, same language, same task, same developer... you would think the three patches would end up looking similar? They did not, not at all.

The feature I wrote is is support for using UTF8 on SMTP, which I've implemented for Postfix, Sendmail and Qmail, which all run on linux/posix systems. I tried to follow the code style for each of them, and surprised myself at how different my code looked.

One patch is well-engineered, prim and proper.

The next is for an amorphous blob of software. The patch is itself amorphous, and makes functions even longer that were too long already. Yet it's half as long as the first patch. The two are, in my own judgment, about equally readable. One wins on length, the other on readability, they're roughly tied overall. This surprised me not a little.

The third is a short, readable patch which one might call an inspired hack. It's a much smaller than the others and easily wins on readability too.

It wasn't supposed to be like that, was it? Good engineering shouldn't give the most verbose patch, and the hack shouldn't be the most lucid of the three.

I see two things here:

First: Proper engineering has its value, but perhaps not as much as common wisdom says. Moderately clean code offers almost all of the value of really clean code.

Second: A small program is easy to work with, such as the MVPs that are so fashionable these days. But ease of modification isn't all, the smallest among the three servers has fallen out of use because the world changed and it stopped being viable.

Some random verbiage on each of the three servers and patches:


Qmail, the last MTA I patched, is an inspired hack, and so's my patch. I expect Qmail works wondrously provided you use it as Dan Bernstein intended (ie. in the world of 1995, to solve the problems of 1995). Since the SMTPUTF8 extension is well-designed and Qmail doesn't implement DSN, my patch ends up being only a few tens of lines:

Qmail's SMTP server advertises advertises a new extension, accepts it, and uses the right keyword in the Received field.

When Qmail's SMTP client is told to send mail, it checks for that Received keyword as well as for UTF8 addresses, and adds the right protocol keyword if necessary.

That's all. Qmail's simplicity makes it that simple. One function implements a Qmail-like state machine to avoid keeping state in the queue file and looks different from my typical code, the rest is overwhelmingly simple.

This same simplicity has its drawbacks and I wouldn't recommend using Qmail for receiving mail on today's internet. I understand some people are using it for outgoing mail at online shops, though.

This patch is in Gentoo, you can read it there or install qmail-1.06-r5. I also have a version that doesn't depend on the TLS patch, send me mail if you'd like a copy of that.


Sendmail is the oldest MTA on the planet, and shows its age. While working with the source I saw that the maintainers mean well and have tried to update the program to comply with good new practice, but they're not doing very much of that. I think refactoring and dropping features is unusually difficult in their case. Maybe they dare not remove code that may still be in use, and their user base includes the oldest and most diverse sites. If there are 20 year old mail servers anywhere, they probably run sendmail.

Sendmail suffers from some overlong functions. Over the decades those functions have grown, and grown, and grown, and today they're so big that they effectively block dependency injection. That, in turn, blocks unit testing. My patch adds zero unit tests and my automatic system tests aren't portable to the maintainers' systems. Sad.

The Sendmail patch is much larger than the Qmail one, partly because sendmail's more orderly and partly because it's less orderly.

Sendmail stores information in queue files in an orderly, extensible way, unlike Qmail. Because of that, my patch can store its extra bit sensibly, which adds code to define, read, write and process the extra bit. Qmail didn't allow any of that, which was limiting, but oh how simple!

Sendmail's overlong functions led to less orderly changes than Qmail. I wanted to write code as tidy as that in Qmail, but it just wouldn't fit.

This patch is in the Sendmail maintainers' queue; send me mail if you'd like to have a copy.


The first MTA I patched was Postfix, almost three years ago now. That patch is even bigger than the one for Sendmail.

Postfix is similar to Sendmail in some ways, but more well-ordered, better documented and without all he dinosaur features. Postfix does not contain any overlong functions. (Maybe you think some functions are too long or have other problems, but the maintainers like Postfix as it is. I keep my opinion to myself — a good patch follows the maintainer's desired style and that's all I have to say.)

Postfix contains all the functionality you need on the network today, and the maintainers (chiefly Wietse Venema) are amazingly good at keeping the features complete and orthogonal. If you combine any two Postfix settings, the combination works as intended because Wietse has thought about that combination.

The price for that is that there are a lot of cases to handle. Qmail autodetects ASCII/UTF8 when you send mail via its command-line interface, Sendmail does the same, Postfix has has more complex logic and supports four cases instead of two.

This patch was integrated into Postfix 3.0.0, and is in all the major Linux distributions by now.


Use UTF8 or Punycode for email addresses?

Unicode addresses in email, such as مثال@مثال.السعودية, can be written using either Punycode or UTF8. (Or, if you're feeling inventive, in another manner you invent.) Which is best?

UTF8 looks like this: From: Arabic Example <مثال@مثال.السعودية>, punycode is From: Arabic Example <xn--mgbh0fb@xn--mgbh0fb.xn--mgberp4a5d4ar>.

The answer follows from two of the design goals for the unicode email extensions:

  1. Allow UTF8 everywhere
  2. Extend email, don't restrict it

RFC 821 and its successors do not contain any rules such as you MUST NOT put the letter n next to an x, so Punycode is allowed. EAI allows Punycode by virtue of not forbidding what was previously allowed. But the right way is to use UTF8 everywhere. Use UTF8 in the subject field, in the body text, in the address… everywhere! That's allowed, it's a design goal, and it's better than Punycode for four reasons.

First, it's simpler than using Punycode in addresses, 2047 encoding in the subject text and qp/b64 encoding in the body text.

Second, it's very, very readable. A surprising amount of legacy software does the right thing if you send it UTF8, and that goes for humans who read email source too.

Third, Punycode is specified for the domain part of addresses, but not for the localpart, and if rumour is to be believed, people are using two incompatible encodings for the localpart. (In the example above, the second and third instances of xn-- are specified, but the first is not and one vendor reputedly does it differently.)

Fourth, sending Punycode habituates users to accept random hex blobs in addresses. A phisher's dream.

So use UTF8 everywhere in the message. Mapping to Punycode is necessary when doing the MX lookup in order to transmit the message, but only then.


Installing and testing Postfix with unicode address support

By request: A step-to-step guide to installing/testing unicode addresses with Postfix. Perhaps overly detailed.

I'll use a new linux/ubuntu host here. As it happens, I use a 64-bit ubuntu 14.04 at Amazon.

The following commands prepare the host. This updates the host with the latest package database and packages, so that the later commands won't fail due to package inavailability or version conflicts. (more…)


Implementation notes about unicode mail

I've implemented unicode mail three times now; in Postfix (paid for by CNNIC and not yet integrated), in aox and lastly in an old mail reader I'm porting from the Zaurus PDA to Android (unreleased as yet, send me mail if you'd like beta access). This is mostly a random collection of notes and remarks I collected while writing the code.

The specification was produced by an IETF working group called EAI (short for email address internationalisation). The WG produced two generations of RFCs. First, an experimental series which I ignore, then a revised, simplified and improved series. This covers the second generation, which takes the general position that unicode mail is only sent to recipients who understand it. There is no conversion during transport, and (almost) no fallback to ASCII.

RFC 6530 is an overview/introduction. It points to the other documents, and has some extra text. Worth reading.

6531 describes how unicode addresses are used with SMTP: MAIL FROM, RCPT TO and VRFY accept UTF8 addresses, and there's a safeguard to provoke a syntax error in case a unicode message body would otherwise reach someone who cannot accept it. (more…)


A unicode email autoresponder

I've set up a test address for the SMTPUTF8 extension created by the IETF EAI working group.

If you send mail to jøran@blåbærsyltetø Jøran will send you a stock reply, which you can use to test that unicode mail works in both directions.

For the moment you must be able to send via IPv6. Jøran can send the reply back via either IPv4 or IPv6, but you have to send the initial message via IPv6. I intend to add a v4-capable secondary MX later.

I have or can arrange other testing too; send me mail if you're interested.


Test messages for unicode mail addresses (EAI)

EAI is a set of RFCs to enable unicode email addresses. jø and even jøran@blåbærsyltetø are syntactically valid email addresses. There are RFCs to extend the email message syntax, to transmit these messages via SMTP, access them via POP and IMAP, and to provide read access by unextended IMAP/POP clients.

I wrote a set of test messages for EAI this morning and put them on github. Feel free to send me extensions and corrections.


Email address internationalisation

EAI defines a set of RFCs to provide non-ASCII email addresses. på I looked at them with a view to implementing that in Archiveopteryx.

The good news: It's simple and sane.

The bad news: I can tell it's possible to spend a lot of time arguing about minor side issues.