Threading email using Thread-Index
Microsoft Exchange sends email containing a header field called Thread-Index that does much the same job as References. I've no idea why Exchange does that instead of the normal way. But I have found out how to parse it, and it's not terribly difficult. It's easiest to explain using examples, so here are the Thread-Index fields from four messages:
Thread-Index: Aca8OXuAU3E0OYfxS/CjgSLFGePpiQAdZqFQACzEh/AAmOpSkA==
Thread-Index: Ace2UdoJVaeQeVVpSp2ZxZp3q7pBrg==
Thread-Index: Aca6Q3T3KOXW2RS5EX9R13340HQWLP==
Thread-Index: AcgB760UE69bP0PFT52+hzoqOvhDDQAAAo8g
The first 30 characters are different for all of these, which means that the messages belong to different threads. Hoever, three out of the following four messages belong to the same thread:
Thread-index: Acnk7AtURXtWl9jTRdGX7mDGIxfHtw==
Thread-Index: AcZJXMEg7BHm/IX0SSuMswWV9Kglbg==
Thread-index: Acnk7AtURXtWl9jTRdGX7mDGIxfHtwAAAo8g
Thread-index: Acnk7AtURXtWl9jTRdGX7mDGIxfHtwAAAo8gqFQACz
The value is really base64-encoded entropy, but parsing and using it doesn't require decoding. Just take the first 30 characters (ie. 176 bits). If they are the same for two messages, then those messages belong to the same thread. Within a thread, a reply's Thread-Index is the same as that of its parent, with a random suffix added. In my second example, the fourth message is a reply to the third, and the third is a reply to the first.
Here is some code that converts Thread-Index to References.
Generating Thread-Index is almost as simple, you just need 176 bits of entropy and a base64 encoder. Why bother, though? Exchange doesn't dominate the market any more. Gmail is king of the hill now and it sends References, so Exchange has to parse References just like the rest of us.
Update: Microsoft has a specification on the web and Meridian Discovery has reverse engineered what's actually done. The format is as descibed above, but there's more detail. The initial 176 bits aren't just 176 random bits, but rather 176 more or less random bits made in a specific manner. I wouldn't trust the senders to do that according to spec, and will still treat this as 176+n×44 random bits.