Arnt Gulbrandsen
About meAbout this blog

The one-minute guide to implementing unicode email addresses

The unicode email address extensions are pleasantly simple to implement. Here is an overview of the RFCs and some notes I made while doing my first implementations; this posting is a very brief description of the protocol and format extensions involved. Despite its brevity it's nearly complete, because these extensions are so simple.

Mail message format: Using UTF8 everywhere is now permitted. Instead of using RFC2047 encoding, quoted-printable and more, messages can use UTF8 everywhere.

To: Jøran Øygårdvær <jøran@blåbærsyltetø> Subject: Høy på pæra Content-Type: text/plain; charset=utf8 Gørrlei av eksempler.

No encoding is necessary anywhere. The message above lacks From and Date, apart from that it's correct.

Sending mail using SMTP: The server advertises the SMTPUTF8 extension, the MAIL FROM command includes the argument SMTPUTF8, and the email addresses can then use UTF8.

$ telnet 25 Trying 2001:6d8::4269… Connected to Escape character is '^]'. 220 ESMTP Postfix (3.0.0) ehlo myhostname 250-PIPELINING … 250 SMTPUTF8 mail from:<> smtputf8 250 2.1.0 Ok rcpt to:<jøran@blåbærsyltetø> 250 2.1.5 Ok data 354 End data with .

Note that the EHLO argument is sent before the client knows whether the server supports SMTPUTF8. It's best to use ASCII-only EHLO arguments.

The SMTPUTF8 argument to MAIL FROM has two purposes: Notify the mail server that one or more addresses may contain UTF8, and make sure that the recipient software does not receive a message it will not be able to parse.

Thus, if you send a message to आर्न्ट@यूनिवर्सल.भारत with a cc to and the mail software at does not support SMTPUTF8, then only आर्न्ट@यूनिवर्सल.भारत will receive the message. The mail server for will reject the message. This is intentional.

An MTA needs to do an IDN conversion (e.g. from blåbærsyltetøy.­gulbrandsen.­ to xn--blbrsyltety-y8ao3x.­gulbrandsen.­ as part of MX lookup, a client that connects to its local server doesn't need even that.

Access using IMAP: The server advertises the ENABLE extension, the client sends ENABLE UTF8=ACCEPT (that's legal even if the server advertises only ENABLE), the server acknowledges having enabled UTF8=ACCEPT, and from that point, both server and client can use UTF8 for any quoted string, including folder names, search strings and addresses.

$ telnet 143 Trying 2001:6d8::6942… Connected to Escape character is '^]'. * OK [CAPABILITY … ENABLE … a login arnt pils a OK [CAPABILITY … ENABLE …UTF8=ACCEPT … b enable utf8=accept * ENABLED UTF8=ACCEPT b OK done c select "Gørrlei"

Testing: Gmail supports this, both for SMTP, IMAP and webmail. The jøran@… address is an autoresponder, you can send mail to it and will receive a reply in a few seconds. Blåbærsyltetøy means blueberry jam and includes all of the three special letters used in Norwegian, æ, ø and å, so it's often used as a test word.

There are more details, but this is 90% of what's needed to write a correct implementation.