Threading email using Thread-Index
Microsoft Exchange sends email containing a header field called Thread-Index that does much the same job as References. I've no idea why Exchange does that instead of the normal way. But I have found out how to parse it, and it's not terribly difficult. It's easiest to explain using examples, so here are the Thread-Index fields from four messages:
Thread-Index: Aca8OXuAU3E0OYfxS/CjgSLFGePpiQAdZqFQACzEh/AAmOpSkA==
Thread-Index: Ace2UdoJVaeQeVVpSp2ZxZp3q7pBrg==
Thread-Index: Aca6Q3T3KOXW2RS5EX9R13340HQWLP==
Thread-Index: AcgB760UE69bP0PFT52+hzoqOvhDDQAAAo8g
The first 30 characters are different for all of these, which means that the messages belong to different threads. Hoever, three out of the following four messages belong to the same thread:
Thread-index: Acnk7AtURXtWl9jTRdGX7mDGIxfHtw==
Thread-Index: AcZJXMEg7BHm/IX0SSuMswWV9Kglbg==
Thread-index: Acnk7AtURXtWl9jTRdGX7mDGIxfHtwAAAo8g
Thread-index: Acnk7AtURXtWl9jTRdGX7mDGIxfHtwAAAo8gqFQACz
The value is really base64-encoded entropy, but parsing and using it doesn't require decoding. Just take the first 30 characters (ie. 176 bits). If they are the same for two messages, then those messages belong to the same thread. Within a thread, a reply's Thread-Index is the same as that of its parent, with a random suffix added. In my second example, the fourth message is a reply to the third, and the third is a reply to the first.
Here is some code that converts Thread-Index to References.
Generating Thread-Index is almost as simple, you just need 176 bits of high-quality entropy and a base64 encoder. Why bother, though? Exchange doesn't dominate the market any more. Gmail is king of the hill now and it sends References, so Exchange has to parse References just like the rest of us.