Arnt Gulbrandsen
About meAbout this blog
2015-02-27

Threading email using Thread-Index

Microsoft Exchange sends email containing a header field called Thread-Index that does much the same job as References. I've no idea why Exchange does that instead of the normal way. But I have found out how to parse it, and it's not terribly difficult. It's easiest to explain using examples, so here are the Thread-Index fields from four messages:

Thread-Index: Aca8OXuAU3E0OYfxS/CjgSLFGePpiQAdZqFQACzEh/AAmOpSkA==
Thread-Index: Ace2UdoJVaeQeVVpSp2ZxZp3q7pBrg==
Thread-Index: Aca6Q3T3KOXW2RS5EX9R13340HQWLP==
Thread-Index: AcgB760UE69bP0PFT52+hzoqOvhDDQAAAo8g

The first 30 characters are different for all of these, which means that the messages belong to different threads. Hoever, three out of the following four messages belong to the same thread:

Thread-index: Acnk7AtURXtWl9jTRdGX7mDGIxfHtw==
Thread-Index: AcZJXMEg7BHm/IX0SSuMswWV9Kglbg==
Thread-index: Acnk7AtURXtWl9jTRdGX7mDGIxfHtwAAAo8g
Thread-index: Acnk7AtURXtWl9jTRdGX7mDGIxfHtwAAAo8gqFQACz

The value is really base64-encoded entropy, but parsing and using it doesn't require decoding. Just take the first 30 characters (ie. 176 bits). If they are the same for two messages, then those messages belong to the same thread. Within a thread, a reply's Thread-Index is the same as that of its parent, with a random suffix added. In my second example, the fourth message is a reply to the third, and the third is a reply to the first.

Here is some code that converts Thread-Index to References.

Generating Thread-Index is almost as simple, you just need 176 bits of high-quality entropy and a base64 encoder. Why bother, though? Exchange doesn't dominate the market any more. Gmail is king of the hill now and it sends References, so Exchange has to parse References just like the rest of us.

2015-02-25

Lack of PGP support in aox

I'm not eager to add any PGP support in Archiveopteryx. That shouldn't be needed, but is, because PGP's signature checking is much stricter than e.g. DKIM's. DKIM thinks a duck is a duck, PGP cares deeply about the details. A quoted-printable duck is not the same as a plaintext duck, and two quoted-printable ducks may not be the same either. Archiveopteryx faithfully implements sixty email-related RFCs and mail stored in or processed by aox frequently cannot be verified by PGP.

However. I care about encryption and privacy, and PGP has the mindshare and is widely considered The Solution. The problem with The Solution is that over the years, it has remained steadily at 0.0% adoption. At the moment slightly below 0.005% of email users have PGP keys, and some fraction of those 0.005% actually use PGP. I infer from that number that PGP has defects that block its adoption almost completely. I have some ideas what those defects are, but that doesn't matter, because whatever they are, their result is to block adoption.

This has been the case for 20 years, and by now I consider PGP to be hopeless. PGP hinders encryption and hurts our privacy, it doesn't help. I don't want to write any code to support that. Perhaps only ten lines of code and a few tests are needed, but I just don't want to write even that.

(Am I doing something else? Yes, I am, actually. I'll write about what later.)

Update:After writing the above, I suddently remembered this old dystopian novel. The scenes in the 31st floor offices remind me of PGP. Worthwhile ernest people working hard, doing the best work they can.

2014-06-12

Implementation notes about unicode mail

I've implemented unicode mail three times now; in Postfix (paid for by CNNIC and not yet integrated), in aox and lastly in an old mail reader I'm porting from the Zaurus PDA to Android (unreleased as yet, send me mail if you'd like beta access). This is mostly a random collection of notes and remarks I collected while writing the code.

The specification was produced by an IETF working group called EAI (short for email address internationalisation). The WG produced two generations of RFCs. First, an experimental series which I ignore, then a revised, simplified and improved series. This covers the second generation, which takes the general position that unicode mail is only sent to recipients who understand it. There is no conversion during transport, and (almost) no fallback to ASCII.

RFC 6530 is an overview/introduction. It points to the other documents, and has some extra text. Worth reading.

6531 describes how unicode addresses are used with SMTP: MAIL FROM, RCPT TO and VRFY accept UTF8 addresses, and there's a safeguard to provoke a syntax error in case a unicode message body would otherwise reach someone who cannot accept it. (more…)

2013-07-28

Test messages for unicode mail addresses (EAI)

EAI is a set of RFCs to enable unicode email addresses. jøran@example.com and even jøran@blåbærsyltetøy.no are syntactically valid email addresses. There are RFCs to extend the email message syntax, to transmit these messages via SMTP, access them via POP and IMAP, and to provide read access by unextended IMAP/POP clients.

I wrote a set of test messages for EAI this morning and put them on github. Feel free to send me extensions and corrections.

2012-09-06

Email address internationalisation

EAI defines a set of RFCs to provide non-ASCII email addresses. pål@eksempel.no. I looked at them with a view to implementing that in Archiveopteryx.

The good news: It's simple and sane.

The bad news: I can tell it's possible to spend a lot of time arguing about minor side issues.

2012-07-22

Oryx failed

About ten years ago, Abhijit Menon-Sen and I discussed email storage, archiving and access, and set out to write something good. We failed. It's time to think about why; if this reads like a stream of consciousness it's because I'm thinking.

We were right: Archiving all mail and searching it is the right approach. Gmail has a billion users. (Hm. How many of the people who told me I was wrong in 2002 use gmail now? I can think of at least five.) But we still failed. (more…)

2012-06-12

Fault tolerant programs and programmers

Archiveopteryx git head crashes a bit. Not every day, but some people reports that it crashes every week or month, at random times. Clearly there is a bug. Abhijit and I have discussed it and found a way to contain it, and I've written the code.

But I haven't found a way to push the fix to the master tree. I seem unable to commit and push that code. My soul wants to find the bug and fix it, not contain it.

Meanwhile, I had an appointment with the dentist this morning.

In the waiting room I read a fascinating blog post about a Chromium exploit. Sergey Glazunov, clearly an admirably clever hacker, stitched together fourteen bugs, quirks and missed hardening opportunities to form a critical exploit. The bugtracking information for one of the bugs shows that it was reported, discussed for a few days, then it was idle until Sergey leveraged it, and then it was fixed.

Chromium is a nice browser, and I appreciate the hardening and exploit resistance the team has added. I particularly appreciate the team's honesty: They run their pwnium contests and are frank about the results.

But now I am even less happy about making fault tolerant code. I feel that it may be mentally difficult to make a program tolerate faults and at the same time make a programmer not tolerate faults.

2011-04-22

A sort of bug tracker

All bug trackers suck. That's why Abhijit and I really don't want to use one for aox.

Here's what we do instead. It's subtle.

There's a notes file in the version-controlled source tree. The only way to add to a note (ie. file a bug) is to make us do it. There are some scripts that look at the file. (more…)