简体   繁体   English

日文电子邮件主题编码

[英]japanese email subject encoding

Aparently, encoding japanese emails is somewhat challenging, which I am slowly discovering myself. 显然,对日文电子邮件进行编码有些挑战,我正在慢慢发现自己。 In case there are any experts (even those with limited experience will do), can I please have some guidelines as to how to do it, how to test it and how to verify it? 如果有专家(即使经验有限的专家也会这样做),请问我有一些有关如何做,如何测试和如何验证的准则?

Bear in mind that I've never set foot anywhere near Japan, it is simply that the product I'm developing is used there, among other places. 请记住,我从来没有涉足过日本附近的任何地方,这仅仅是我正在开发的产品在日本以及其他地方使用的原因。

What (I think) I know so far is following: 到目前为止,我所知道的(我认为)是:
- Japanese emails should be encoded in ISO-2022-JP, Japanese JIS codepage 50220 or possibly SHIFT_JIS codepage 932 -日语电子邮件应使用ISO-2022-JP,日语JIS代码页50220或可能的SHIFT_JIS代码页932进行编码
- Email transfer encoding should be set to Base64 for plain text and 7Bit for Html -对于纯文本,电子邮件传输编码应设置为Base64,对于HTML应将其设置为7Bit
- Email subject should be encoded separately to start with "=?ISO-2022-JP?B?" -电子邮件主题应单独编码,以“ =?ISO-2022-JP?B?”开头 (don't know what this is supposed to mean). (不知道这是什么意思)。 I've tried encoding the subject with 我试过用

"=?ISO-2022-JP?B?" + Convert.ToBase64String(Encoding.Unicode.GetBytes(subject))

which basically gives the encoded string as expected but it doesn't get presented as any japanese text in an email program 基本上可以按预期提供编码后的字符串,但是在电子邮件程序中不会将其显示为任何日语文本
- I've tested in Outlook 2003, Outlook Express and GMail -我已经在Outlook 2003,Outlook Express和GMail中进行了测试

Any help would be greatly appreciated 任何帮助将不胜感激


Ok, so to post a short update, thanks to the two helpful answers, I've managed to get the right format and encoding. 好的,由于两个有用的答案,所以发布简短的更新,我已经设法获得正确的格式和编码。 Now, Outlook gives something that resembles the correct subject: 现在,Outlook提供了类似于正确主题的内容:
=?iso-2022-jp?B?6 Japanese test に各々の視点で語ってもらった。 6相当の防水?=

However, the exact same email in Outlook Express gives subject like this: 但是,Outlook Express中的电子邮件完全相同,因此主题如下:
=?iso-2022-jp?B?6 Japanese test 縺ォ蜷・・・隕也せ縺ァ隱槭▲縺ヲ繧ゅi縺」縺溘・ 6逶ク蠖薙・髦イ豌エ?=

Furthermore, when viewed in the Inbox view in Outlook Express, the email subject is even more weird, like this: 此外,在Outlook Express的“收件箱”视图中查看时,电子邮件主题更加奇怪,如下所示:
=?iso-2022-jp?B?6 Japanese test ??????????????? 6???????=

Gmail seems to be working in the similar fashion to Outlook, which looks correct. Gmail似乎以与Outlook类似的方式工作,看起来很正确。

I just can't get my head around this one. 我只是无法理解这一点。

I've been dealing with Japanese encodings for almost 20 years and so I can sympathize with your difficulties. 我从事日语编码已经有将近20年的时间了,因此我可以同情您的困难。 Websites that I've worked on send hundreds of emails daily to Japanese customers so I can share with you what's worked for us. 我曾经工作过的网站每天都会向日本客户发送数百封电子邮件,因此我可以与您分享对我们有用的东西。

  • First of all, do not use Shift-JIS. 首先,不要使用Shift-JIS。 I personally receive tons of Japanese emails and almost never are they encoded using Shift-JIS. 我个人收到了大量的日语电子邮件,几乎从来没有使用Shift-JIS对其进行编码。 I think an old (circa Win 98?) version of Outlook Express encoded outgoing mail using Shift-JIS, but nowadays you just don't see it. 我认为Outlook Express使用Shift-JIS对旧版本的Outlook Express(大约是Win 98?)进行编码,但如今您只是看不到它。

  • As you've figured out, you need to use ISO-2022-JP as your encoding for at least anything that goes in the mail header. 如您所知,对于至少邮件头中包含的所有内容,您都需要使用ISO-2022-JP作为编码。 This includes the Subject, To line, and CC line. 这包括主题,收件人行和抄送行。 UTF-8 will also work in most cases, but it will not work on Yahoo Japan mail, and as you can imagine, many Japanese users use Yahoo Japan mail. UTF-8在大多数情况下也可以使用, 不适用于Yahoo Japan邮件,并且可以想象,许多日本用户使用Yahoo Japan邮件。

  • You can use UTF-8 in the body of the email, but it is recommended that you base64 encode the UTF-8 encoded Japanese text and put that in the body instead of raw UTF-8 text. 您可以在电子邮件的正文中使用UTF-8,但是建议您对UTF-8编码的日语文本进行base64编码,然后将其而不是原始的UTF-8文本放入正文中。 However, in practice, I believe that raw UTF-8 text will work fine these days, for the body of the email. 但是,实际上,我认为对于电子邮件正文而言,原始UTF-8文本现在可以正常工作。

  • As I alluded to above, you need to at least test on Outlook (Exchange), Outlook Express (IMAP/POP3), and Yahoo Japan web mail. 如上所述,您至少需要对Outlook(Exchange),Outlook Express(IMAP / POP3)和Yahoo Japan网络邮件进行测试。 Yahoo Japan is the trickiest because I believe they use EUC for the encoding of their web pages, and so you need to follow the correct standards for your emails or they won't work (ISO-2022-JP is the standard for sending Japanese emails). Yahoo Japan是最棘手的,因为我相信他们使用EUC对其网页进行编码,因此您需要遵循正确的电子邮件标准,否则它们将无法正常工作(ISO-2022-JP是发送日语电子邮件的标准)。

  • Also, your subject line should not exceed 75 characters per line. 另外,主题行每行不得超过75个字符。 That is, 75 characters after you've encoded in ISO-2022-JP and base64, not 75 characters before conversion. 也就是说,用ISO-2022-JP和base64编码后的字符数是75,而不是转换前的字符数是75。 If you exceed 75 characters, you need to break your encoded subject into multiple lines, starting with "=?iso-2022-jp?B?" 如果超过75个字符,则需要将编码的主题分成多行,以“ =?iso-2022-jp?B?”开头 and ending with "?=" on each line. 并在每行上以“?=“结尾。 If you don't do this, your subject might get truncated (depending on the email reader, and also the content of your subject text). 如果您不这样做,则您的主题可能会被截断(取决于电子邮件阅读器以及主题文本的内容)。 According to RFC 2047: 根据RFC 2047:

"An 'encoded-word' may not be more than 75 characters long, including 'charset', 'encoding', 'encoded-text', and delimiters. If it is desirable to encode more text than will fit in an 'encoded-word' of 75 characters, multiple 'encoded-word's (separated by CRLF SPACE) may be used." ““编码字”的长度不能超过75个字符,包括“字符集”,“编码”,“编码文本”和定界符。如果需要编码的文本超出了“编码文本” 75个字符的单词”,可以使用多个“编码单词”(由CRLF SPACE分隔)。”

  • Here's some sample PHP code to encode the subject: 这是一些用于编码主题的示例PHP代码:

 // Convert Japanese subject to ISO-2022-JP (JIS is essentially ISO-2022-JP)

 $subject = mb_convert_encoding ($subject, "JIS", "SJIS");

 // Now, base64 encode the subject

 $subject = base64_encode ($subject);

 // Add the encoding markers to the subject

 $subject = "=?iso-2022-jp?B?" . $subject . "?=";

 // Now, $subject can be placed as-is into the raw mail header.
  • See RFC 2047 for a complete description of how to encode your email header. 有关如何对电子邮件标头进行编码的完整说明,请参阅RFC 2047。

Check http://en.wikipedia.org/wiki/MIME#Encoded-Word for a description on how to encode header fields in MIME-compliant messages. 检查http://en.wikipedia.org/wiki/MIME#Encoded-Word ,以获取有关如何在MIME兼容消息中编码标头字段的描述。 You seem to be missing a “?=” at the end of your subject. 您似乎在主题末尾缺少“?=”。

=?ISO-2022-JP?B?TEXTTEXT... =?ISO-2022-JP?B?TEXTTEXT ...

ISO_2022-JP means that string is encoded in ISO-2022-JP codepage (eg. not Unicode) B means that string is bese64 encoded ISO_2022-JP表示字符串以ISO-2022-JP代码页编码(例如,非Unicode)B表示字符串为bese64编码

In your example, you should just supply your string in ISO-2022-JP instead of Unicode. 在您的示例中,您应该只在ISO-2022-JP中提供字符串,而不是Unicode。

I have some experience composing and sending email in japanese...Normally you have to beware what encoding used for operating system and how you store your japanese strings! 我有一些用日语编写和发送电子邮件的经验...通常,您必须提防操作系统使用哪种编码以及如何存储日语字符串! My Mail objects are normally encoded as follows: 我的邮件对象通常编码如下:

    string s = "V‚µ‚¢ŠwK–@‚Ì‚²’ñˆÄ"; // Our japanese are shift-jis encoded, so it appears like garbled
    MailMessage message = new MailMessage();
    message.BodyEncoding = Encoding.GetEncoding("iso-2022-jp");
    message.SubjectEncoding = Encoding.GetEncoding("iso-2022-jp");
    message.Subject = s.ToEncoding(Encoding.GetEncoding("Shift-Jis")); // Change the encoding to whatever your source is
    message.Body = s.ToEncoding(Encoding.GetEncoding("Shift-Jis")); // Change the encoding to whatever your source is

Then i have an extension method to which does the conversion for me: 然后我有一个扩展方法,可以为我进行转换:

public static string ToEncoding(this string s, Encoding targetEncoding)
        {   
            return s == null ? null : targetEncoding.GetString(Encoding.GetEncoding(1252).GetBytes(s)); //1252 is the windows OS codepage            
        }

something like this should get the job done in python: 这样的事情应该可以在python中完成工作:


#!/usr/bin/python                                                                                                            
# -*- mode: python; coding: utf-8 -*-                                                                                        
import smtplib
from email.MIMEText import MIMEText
from email.Header import Header
from email.Utils import formatdate

def send_from_gmail( from_addr, to_addr, subject, body, password, encoding="iso-2022-jp" ):

    msg = MIMEText(body.encode(encoding), 'plain', encoding)
    msg['Subject'] = Header(subject.encode(encoding), encoding)
    msg['From'] = from_addr
    msg['To'] = to_addr
    msg['Date'] = formatdate()

    s = smtplib.SMTP('smtp.gmail.com', 587)
    s.ehlo(); s.starttls(); s.ehlo()

    s.login(from_addr, password)
    s.sendmail(from_addr, to_addr, msg.as_string())
    s.close()
    return "Sent mail to: %s" % to_addr



if __name__ == "__main__":
    import sys
    for n,item in enumerate(sys.argv):
        sys.argv[n] = sys.argv[n].decode("utf8")

    if len(sys.argv)==6:
        print send_from_gmail( sys.argv[1], sys.argv[2], sys.argv[3], sys.argv[4], sys.argv[5] )
    elif len(sys.argv)==7:
        print send_from_gmail( sys.argv[1], sys.argv[2], sys.argv[3], sys.argv[4], sys.argv[5], encoding=sys.argv[6] )
    else:
        raise "SYNTAX: %s <from_addr> <to_addr> <subject> <body> <password> [encoding]"

**blatantly stolen/adapted from: **公然被盗/改编自:

http://mtokyo.blog9.fc2.com/blog-entry-127.html http://mtokyo.blog9.fc2.com/blog-entry-127.html

<?php

function sendMail($to, $subject, $body, $from_email,$from_name)
 {
$headers  = "MIME-Version: 1.0 \n" ;
$headers .= "From: " .
       "".mb_encode_mimeheader (mb_convert_encoding($from_name,"ISO-2022-JP","AUTO")) ."" .
       "<".$from_email."> \n";
$headers .= "Reply-To: " .
       "".mb_encode_mimeheader (mb_convert_encoding($from_name,"ISO-2022-JP","AUTO")) ."" .
       "<".$from_email."> \n";


$headers .= "Content-Type: text/plain;charset=ISO-2022-JP \n";


/* Convert body to same encoding as stated
in Content-Type header above */

$body = mb_convert_encoding($body, "ISO-2022-JP","AUTO");

/* Mail, optional parameters. */
$sendmail_params  = "-f$from_email";

mb_language("ja");
$subject = mb_convert_encoding($subject, "ISO-2022-JP","AUTO");
$subject = mb_encode_mimeheader($subject);

$result = mail($to, $subject, $body, $headers, $sendmail_params);

return $result;
}

Introduction of Japanese encoding to e-mail happened at JUNET(UUCP based nation-wide network) in early 90's. 电子邮件的日语编码引入是在90年代初的JUNET(基于UUCP的全国性网络)上进行的。

At that time, RFC1468 was defined. 当时定义了RFC1468。 If you follow RFC1468 in plain text mail, there would be no problem. 如果您在纯文本邮件中遵循RFC1468,则不会有问题。

If you want to handle html mail, RFC1468 is useless except for header parts. 如果要处理html邮件,则RFC1468除标题部分外没有用。

First of all you should be using: 首先,您应该使用:

Encoding.GetEncoding("ISO-2022-JP")

to convert your subject line into bytes that will be processed by Convert.ToBase64String(). 将您的主题行转换为将由Convert.ToBase64String()处理的字节。

=?ISO-2022-JP?B?TEXTTEXT...?= tells the receiving mail client which encoding was used on the sender's side to convert japanese "letters" into a byte stream. =?ISO-2022-JP?B?TEXTTEXT ...... =告诉接收邮件客户端,发送方使用哪种编码将日语“字母”转换为字节流。

Currently you're using UTF-16 to encode, but specifying ISO-2022-JP to decode. 当前,您正在使用UTF-16进行编码,但是指定ISO-2022-JP进行解码。 These are obviously two different encodings, I guess, just like ISO-8859-1 is different from Unicode (most extended western-europe chars are represented by one byte in ISO-XXX, but two bytes in Unicode). 我想这显然是两种不同的编码,就像ISO-8859-1与Unicode不同(大多数扩展的Western-Europe字符在ISO-XXX中用一个字节表示,而在Unicode中用两个字节表示)。

I'm not sure what you mean about UTF-8 being second-class citizen. 我不确定您对UTF-8是二等公民的意思。 As long as the receiving mail client understands UTF-8 and is able to convert it to the current japanese locale, everything is fine. 只要接收邮件的客户端能够理解UTF-8并将其转换为当前的日语语言环境,一切都很好。

Here's what I use to send Japanese emails. 这是我用来发送日语电子邮件的方法。 Subject line looks fine in Outlook 2010, gmail and on iPhone. 在Outlook 2010,Gmail和iPhone上,主题行看起来都不错。

Encoding encoding = Encoding.GetEncoding("iso-2022-jp");
byte[] bytes  = encoding.GetBytes(subject);
string uuEncoded = Convert.ToBase64String(bytes);
subject = "=?iso-2022-jp?B?" + uuEncoded + "?=";

// not sure this is actually necessary...
mailMessage.SubjectEncoding = Encoding.GetEncoding("iso-2022-jp");

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM