Arnt Gulbrandsen
About meAbout this blog
2017-12-13

The one-minute guide to implementing unicode email addresses

The unicode email address extensions are pleasantly simple to implement. Here is an overview of the RFCs and some notes I made while doing my first implementations; this posting is a very brief description of the protocol and format extensions involved. Despite its brevity it's nearly complete, because these extensions are so simple.

Mail message format: Using UTF8 everywhere is now permitted. Instead of using RFC2047 encoding, quoted-printable and more, messages can use UTF8 everywhere.

To: Jøran Øygårdvær <jøran@blåbærsyltetøy.gulbrandsen.priv.no> Subject: Høy på pæra Content-Type: text/plain; charset=utf8 Gørrlei av eksempler.

No encoding is necessary anywhere. The message above lacks From and Date, apart from that it's correct.

Sending mail using SMTP: The server advertises the SMTPUTF8 extension, the MAIL FROM command includes the argument SMTPUTF8, and the email addresses can then use UTF8.

$ telnet mx.example.com 25 Trying 2001:6d8::4269… Connected to mx.example.com Escape character is '^]'. 220 mx.example.com ESMTP Postfix (3.0.0) ehlo myhostname 250-mx.example.com 250-PIPELINING … 250 SMTPUTF8 mail from:<> smtputf8 250 2.1.0 Ok rcpt to:<jøran@blåbærsyltetøy.gulbrandsen.priv.no> 250 2.1.5 Ok data 354 End data with .

Note that the EHLO argument is sent before the client knows whether the server supports SMTPUTF8. It's best to use ASCII-only EHLO arguments.

The SMTPUTF8 argument to MAIL FROM has two purposes: Notify the mail server that one or more addresses may contain UTF8, and make sure that the recipient software does not receive a message it will not be able to parse.

Thus, if you send a message to आर्न्ट@यूनिवर्सल.भारत with a cc to example@example.com and the mail software at example.com does not support SMTPUTF8, then only आर्न्ट@यूनिवर्सल.भारत will receive the message. The mail server for example.com will reject the message. This is intentional.

An MTA needs to do an IDN conversion (e.g. from blåbærsyltetøy.­gulbrandsen.­priv.no to xn--blbrsyltety-y8ao3x.­gulbrandsen.­priv.no) as part of MX lookup, a client that connects to its local server doesn't need even that.

Access using IMAP: The server advertises the ENABLE extension, the client sends ENABLE UTF8=ACCEPT (that's legal even if the server advertises only ENABLE), the server acknowledges having enabled UTF8=ACCEPT, and from that point, both server and client can use UTF8 for any quoted string, including folder names, search strings and addresses.

$ telnet imap.example.com 143 Trying 2001:6d8::6942… Connected to imap.example.com. Escape character is '^]'. * OK [CAPABILITY … ENABLE … a login arnt pils a OK [CAPABILITY … ENABLE …UTF8=ACCEPT … b enable utf8=accept * ENABLED UTF8=ACCEPT b OK done c select "Gørrlei"

Testing: Gmail supports this, both for SMTP, IMAP and webmail. The jøran@… address is an autoresponder, you can send mail to it and will receive a reply in a few seconds. Blåbærsyltetøy means blueberry jam and includes all of the three special letters used in Norwegian, æ, ø and å, so it's often used as a test word.

There are more details, but this is 90% of what's needed to write a correct implementation.

2017-08-17

Tokyo Martini

There are several drinks by this name around the web. The others are poor imitations, please disregard.

You'll need a green tea bitter: Vodka in which some green tea leaves have been steeped for a while. I prefer darjeeling leaves for a day, perhaps even briefer.

Make as a very dry martini with a few drops of the green tea bitter, and a thin slice of ginger. Enjoy.

I'm not sure which martini variant I like better, the Webster F. Street layaway plan or this? Try both.

2017-05-16

I give up

The newspaper yesterday drove me over the edge. I admit it: The word hacker hasn't been usable for someone like me for a long time, I just didn't want to admit it.


A long, long time ago I had a keycard that didn't do what I needed. The keycard system was a compromise between what the organisation needed and what the keycard vendor could deliver, and it was mostly okay. Not 100%.

With my usual luck, one of the exceptions turned out to apply to myself. I couldn't get all the access that I needed while also being locked out of everything else. But I'm a hacker, so I poked around a little and made myself my own card, with enough access. Problem solved.

I didn't hide the card (you could see that it was homemade) (more…)

2017-03-01

Using procmail as an autoresponder

Procmail is old and almost forgotten, but still works well. This short script is the autoresponder for jøran@blåbærsyltetøy.gulbrandsen.priv.no:

:0 c
/tmp/jøranmail

:0
* !^FROM_DAEMON
* !^X-Loop:
* !^Auto-Submitted:
| (formail -r -t -I"From: Jøran Øygårdvær " -A"Auto-Submitted: auto-replied" -A"Mime-Version: 1.0" -A"Content-Type: text/plain; charset=utf8" -A"Content-Transfer-Encoding: 8bit" ; echo "Liker du blåbærsyltetøy? Jeg synes blåbærsyltetøy er veldig godt." ; echo ; echo "-- " ; echo "Jøran") | /usr/sbin/sendmail -t

The first clause stores all incoming mail in /tmp/jøranmail just in case it's needed for debugging. The second clause filters out three kinds of mail that should not receive an autoreply. For messages that pass all three hurdles, it runs formail with many arguments to create an EAI-compliant autoreply header, echo to write a brief reply, and sends the result back.

It may not be terribly readable, but it's brief and reliable. That glass is two thirds full, not one third empty.

2017-01-25

Audience and goal

Written texts have two major invisible properties: audience and goal. I can't remember who taught me about that, but I taught it to my friend Abhijit Menon-Sen when we started working together, and the texts he and I have written together over the years always have a hidden comment describing the audience(s) and goal(s) for that text. That's why those texts are crisp and to the point.

Here's an example. Not by us together, this one is his alone.

Abhijit and his girlfriend Hassath have recently built a house, it's aaaaalmost done now, and a few weeks ago Abhijit wrote a blog posting about an electric power gadget they bought for the house. The audience for that posting consists of two groups of people: Friends who want to know how the house is coming along, and people who are searching for reviews of the gadget before possibly buying one themselves. The goal for the first group is to describe the power problems and how Abhijit and Hassath are coping, and for the second group, to tell them whether and how well that particular gadget helps with that kind of power problem.

Now please read about their amazingly unreliable power supply and consider how each sentence, each paragraph and each picture helps with either or both of those goals, in the eyes of those audiences. Does a sentence say something that both audiences already know, or does it tell either or both audiences something Abhijit wants to tell them? Does a sentence help with one goal but disturb the other? What does the photo achieve? Would mentioning the audiences or goals in the text help to achieve either of the goals, or would it distract or detract?

That posting may be stylistically vapid, but it achieves Abhijit's goals and that makes it good writing. The rest is a mere question of how good.

Now please start formulating an explicit audience and goal before you write your own email, documentation, almost anything. Help save the world from pointless blather and documentation that forgets the critical points.

2017-01-23

A short digression on humane architecture

This stadium was designed by Zaha Hadid and meant to be built in Tokyo:

I don't actually know, but I expect that the committee felt that it might be a landmark like the Sydney Opera House, famous, forever photographed. The committee scrapped it, though, and decided to build this, instead, and it makes me so happy:

Much more humane. By comparison, Zaha Hadid's white wonder looks like a science-fiction paperback cover. It doesn't look bad, (more…)

2017-01-20

A new email address

I think mail to आर्न्ट@यूनिवर्सल.भारत should now land in my inbox... wonder how long it'll take before the first spammer manages to spam that address.

2017-01-19

Three programs, one feature

It's not something one does often, but I've implemented the same feature in three different programs. Not very different, all are written in the same programming language for the same platform, and all are servers.

Same platform, same language, same task, same developer... you would think the three patches would end up looking similar? They did not, not at all.

The feature I wrote is is support for using UTF8 on SMTP, which I've implemented for Postfix, Sendmail and Qmail, which all run on linux/posix systems. I tried to follow the code style for each of them, and surprised myself at how different my code looked.

One patch is well-engineered, prim and proper.

The next is for an amorphous blob of software. The patch is itself amorphous, and makes functions even longer that were too long already. Yet it's half as long as the first patch. The two are, in my own judgment, about equally readable. One wins on length, the other on readability, they're roughly tied overall. This surprised me not a little.

The third is a short, readable patch which one might call an inspired hack. It's a much smaller than the others and easily wins on readability too.

It wasn't supposed to be like that, was it? Good engineering shouldn't give the most verbose patch, and the hack shouldn't be the most lucid of the three.

I see two things here:

First: Proper engineering has its value, but perhaps not as much as common wisdom says. Moderately clean code offers almost all of the value of really clean code.

Second: A small program is easy to work with, such as the MVPs that are so fashionable these days. But ease of modification isn't all, the smallest among the three servers has fallen out of use because the world changed and it stopped being viable.

Some random verbiage on each of the three servers and patches: (more…)