简体   繁体   中英

Escape email subject line

I know that email subjects do not have to be escaped, since (as far as I understood) the are HTTP headers and not HTML text.

So writing è inside the subject line would output è to the user.

There are some automatic emails that I want to send and in some languages they contain some non-ascii characters too.

Since my host integrated editor (which I use sometimes for quick edits) does not support UTF-8 encoding, I prefer using ASCII only and I always escape everything ( à for HTML, \\xe0 for JS and so on...)

So, is there a way to escape email subjects using ASCII only, even if the recepient does support UTF-8?

&...; are HTML/XML entities and have nothing to do with email. You will not be able to reliably have these translated into the desired symbol, and I would consider anything that did translate them to be the result of a bug.

Also, there is not such thing as an "ASCII è". "ASCII" isn't a real encoding, "extended ASCII" is a misapplication of ISO8859 and or Microsoft cp12XX encodings. If your client can't support anything other than unaccented english text, then that's all you can use.

That said, while all email headers must , according to spec, be 7-bit-safe "ASCII" text there is a provision for encoding header containing text in other charsets. UTF, ISO, MS CP, etc.

function encode_subject($input, $charset, $method='B') {
    switch($method) {
        case 'B':
            $encoded = base64_encode($input);
            break;
        case 'Q':
            $encoded = quoted_printable_encode($input);
            break;
        default:
            throw new Exception('Unknonw encoding method: ' . $method);
    }

    return sprintf('=?%s?%s?%s?=', $charset, $method, $encoded);
}

$input     = 'Welcome to the fancy è club!'; // utf8
$utf8      = $input;
$iso8859_1 = mb_convert_encoding($input, 'iso-8859-1', 'utf-8');
$cp1252    = mb_convert_encoding($input, 'cp1252',     'utf-8');

var_dump(
    $utf8,
    encode_subject($utf8, 'utf-8', 'B'),
    encode_subject($utf8, 'utf-8', 'Q'),
    $iso8859_1,
    encode_subject($iso8859_1, 'iso-8859-1', 'B'),
    encode_subject($iso8859_1, 'iso-8859-1', 'Q'),
    $cp1252,
    encode_subject($cp1252, 'cp1252', 'B'),
    encode_subject($cp1252, 'cp1252', 'Q')
);

Output:

string(29) "Welcome to the fancy è club!"
string(52) "=?utf-8?B?V2VsY29tZSB0byB0aGUgZmFuY3kgw6ggY2x1YiE=?="
string(45) "=?utf-8?Q?Welcome to the fancy =C3=A8 club!?="

string(28) "Welcome to the fancy � club!"
string(57) "=?iso-8859-1?B?V2VsY29tZSB0byB0aGUgZmFuY3kg6CBjbHViIQ==?="
string(47) "=?iso-8859-1?Q?Welcome to the fancy =E8 club!?="

string(28) "Welcome to the fancy � club!"
string(53) "=?cp1252?B?V2VsY29tZSB0byB0aGUgZmFuY3kg6CBjbHViIQ==?="
string(43) "=?cp1252?Q?Welcome to the fancy =E8 club!?="

So whatever charset you're sending your emails as, use that to encode the subject as well. If your recipients are using old, busted mail clients that cant properly decode text in the language that they probably speak , then they have much larger problems that you have nothing to do with.

Hot Take

UTF-8 everywhere, for everything. Anything that doesn't support UTF8 in 2020 is defective and not your problem. Unless your target market is people using Windows ME or a Palm Pilot from 2004, use UTF-8.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM