The one-minute guide to implementing unicode email addresses
The unicode email address extensions are pleasantly simple to implement. Here is an overview of the RFCs and some notes I made while doing my first implementations; this posting is a very brief description of the protocol and format extensions involved. Despite its brevity it's nearly complete, because these extensions are so simple.
Mail message format: Using UTF8 everywhere is now permitted. Instead of using RFC2047 encoding, quoted-printable and more, messages can use UTF8 everywhere.
To: Jøran Øygårdvær <jøran@blåbærsyltetøy.gulbrandsen.priv.no>
Subject: Høy på pæra
Content-Type: text/plain; charset=utf8
Gørrlei av eksempler.
No encoding is necessary anywhere. The message above lacks From and Date, apart from that it's correct.
Sending mail using SMTP: The server advertises the SMTPUTF8 extension, the MAIL FROM command includes the argument SMTPUTF8, and the email addresses can then use UTF8.
$ telnet mx.example.com 25
Trying 2001:6d8::4269…
Connected to mx.example.com
Escape character is '^]'.
220 mx.example.com ESMTP Postfix (3.0.0)
ehlo myhostname
250-mx.example.com
250-PIPELINING
…
250 SMTPUTF8
mail from:<> smtputf8
250 2.1.0 Ok
rcpt to:<jøran@blåbærsyltetøy.gulbrandsen.priv.no>
250 2.1.5 Ok
data
354 End data with .
…
Note that the EHLO argument is sent before the client knows whether the server supports SMTPUTF8. It's best to use ASCII-only EHLO arguments.
The SMTPUTF8 argument to MAIL FROM has two purposes: Notify the mail server that one or more addresses may contain UTF8, and make sure that the recipient software does not receive a message it will not be able to parse.
Thus, if you send a message to आर्न्ट@यूनिवर्सल.भारत with a cc to example@example.com and the mail software at example.com does not support SMTPUTF8, then only आर्न्ट@यूनिवर्सल.भारत will receive the message. The mail server for example.com will reject the message. This is intentional.
An MTA needs to do an IDN conversion (e.g. from blåbærsyltetøy.gulbrandsen.priv.no to xn--blbrsyltety-y8ao3x.gulbrandsen.priv.no) as part of MX lookup, a client that connects to its local server doesn't need even that.
Access using IMAP: The server advertises the ENABLE extension, the client sends ENABLE UTF8=ACCEPT (that's legal even if the server advertises only ENABLE), the server acknowledges having enabled UTF8=ACCEPT, and from that point, both server and client can use UTF8 for any quoted string, including folder names, search strings and addresses.
$ telnet imap.example.com 143
Trying 2001:6d8::6942…
Connected to imap.example.com.
Escape character is '^]'.
* OK [CAPABILITY … ENABLE …
a login arnt pils
a OK [CAPABILITY … ENABLE …UTF8=ACCEPT …
b enable utf8=accept
* ENABLED UTF8=ACCEPT
b OK done
c select "Gørrlei"
Testing: Gmail supports this, both for SMTP, IMAP and webmail. The jøran@… address is an autoresponder, you can send mail to it and will receive a reply in a few seconds. Blåbærsyltetøy means blueberry jam and includes all of the three special letters used in Norwegian, æ, ø and å, so it's often used as a test word.
There are more details, but this is 90% of what's needed to write a correct implementation.