Threading email using Thread-Index

Microsoft Exchange sends email containing a header field called Thread-Index that does much the same job as References. I've no idea why Exchange does that instead of the normal way. But I have found out how to parse it, and it's not terribly difficult. It's easiest to explain using examples, so here are the Thread-Index fields from four messages:

Thread-Index: Aca8OXuAU3E0OYfxS/CjgSLFGePpiQAdZqFQACzEh/AAmOpSkA==
Thread-Index: Ace2UdoJVaeQeVVpSp2ZxZp3q7pBrg==
Thread-Index: Aca6Q3T3KOXW2RS5EX9R13340HQWLP==
Thread-Index: AcgB760UE69bP0PFT52+hzoqOvhDDQAAAo8g

The first 30 characters are different for all of these, which means that the messages belong to different threads. Hoever, three out of the following four messages belong to the same thread:

Thread-index: Acnk7AtURXtWl9jTRdGX7mDGIxfHtw==
Thread-Index: AcZJXMEg7BHm/IX0SSuMswWV9Kglbg==
Thread-index: Acnk7AtURXtWl9jTRdGX7mDGIxfHtwAAAo8g
Thread-index: Acnk7AtURXtWl9jTRdGX7mDGIxfHtwAAAo8gqFQACz

The value is really base64-encoded entropy, but parsing and using it doesn't require decoding. Just take the first 30 characters (ie. 176 bits). If they are the same for two messages, then those messages belong to the same thread. Within a thread, a reply's Thread-Index is the same as that of its parent, with a random suffix added. In my second example, the fourth message is a reply to the third, and the third is a reply to the first.

Here is some code that converts Thread-Index to References.

Generating Thread-Index is almost as simple, you just need 176 bits of entropy and a base64 encoder. Why bother, though? Exchange doesn't dominate the market any more. Gmail is king of the hill now and it sends References, so Exchange has to parse References just like the rest of us.

Update: Microsoft has a specification on the web and Meridian Discovery has reverse engineered what's actually done. The format is as descibed above, but there's more detail. The initial 176 bits aren't just 176 random bits, but rather 176 more or less random bits made in a specific manner. I wouldn't trust the senders to do that according to spec, and will still treat this as 176+n×44 random bits.

Lack of PGP support in aox

I'm not eager to add any PGP support in Archiveopteryx. That shouldn't be needed, but is, because PGP's signature checking is much stricter than e.g. DKIM's. DKIM thinks a duck is a duck, PGP cares deeply about the details. A quoted-printable duck is not the same as a plaintext duck, and two quoted-printable ducks may not be the same either. Archiveopteryx faithfully implements sixty email-related RFCs and mail stored in or processed by aox frequently cannot be verified by PGP.

However. I care about encryption and privacy, and PGP has the mindshare and is widely considered The Solution. The problem with The Solution is that over the years, it has remained steadily at 0.0% adoption. At the moment slightly below 0.005% of email users have PGP keys, and some fraction of those 0.005% actually use PGP. I infer from that number that PGP has defects that block its adoption almost completely. I have some ideas what those defects are, but that doesn't matter, because whatever they are, their result is to block adoption.

This has been the case for 20 years, and by now I consider PGP to be hopeless. PGP hinders encryption and hurts our privacy, it doesn't help. I don't want to write any code to support that. Perhaps only ten lines of code and a few tests are needed, but I just don't want to write even that.

(Am I doing something else? Yes, I am, actually. I'll write about what later.)

Update: After writing the above, I suddenly remembered this old dystopian novel. The scenes in the 31st floor offices remind me of PGP. Worthwhile ernest people working hard, doing the best work they can.

Implementation notes about unicode mail

I've implemented unicode mail three times now; in Postfix (paid for by CNNIC and not yet integrated), in aox and lastly in an old mail reader I'm porting from the Zaurus PDA to Android (unreleased as yet, send me mail if you'd like beta access). This is mostly a random collection of notes and remarks I collected while writing the code.

The specification was produced by an IETF working group called EAI (short for email address internationalisation). The WG produced two generations of RFCs. First, an experimental series which I ignore, then a revised, simplified and improved series. This covers the second generation, which takes the general position that unicode mail is only sent to recipients who understand it. There is no conversion during transport, and (almost) no fallback to ASCII.

RFC 6530 is an overview/introduction. It points to the other documents, and has some extra text. Worth reading.

6531 describes how unicode addresses are used with SMTP: MAIL FROM, RCPT TO and VRFY accept UTF8 addresses, and there's a safeguard to provoke a syntax error in case a unicode message body would otherwise reach someone who cannot accept it. […More…]

Test messages for unicode mail addresses (EAI)

EAI is a set of RFCs to enable unicode email addresses. jøran@example.com and even jøran@blåbærsyltetøy.no are syntactically valid email addresses. There are RFCs to extend the email message syntax, to transmit these messages via SMTP, access them via POP and IMAP, and to provide read access by unextended IMAP/POP clients.

I wrote a set of test messages for EAI this morning and put them on github. Feel free to send me extensions and corrections.

Email address internationalisation

EAI defines a set of RFCs to provide non-ASCII email addresses. pål@eksempel.no. I looked at them with a view to implementing that in Archiveopteryx.

The good news: It's simple and sane.

The bad news: I can tell it's possible to spend a lot of time arguing about minor side issues.

From: Charlie Root <root@…> — but which Charlie?

I hate it when different, independent computers all send me mail from my close friend Charlie Root. Here's an aox hack to ease the pain. […More…]

Privacy, Received, etc.

Summary: Archiveopteryx is loyal to its users, right or wrong. […More…]