[英]C# Email subject parsing
I'm building a system for reading emails in C#. 我正在构建一个用C#读取电子邮件的系统。 I've got a problem parsing the subject, a problem which I think is related to encoding.
我在解析主题时遇到了问题,我认为这个问题与编码有关。
The subject I'm reading is as follows: =?ISO-8859-1?Q?=E6=F8sd=E5f=F8sdf_sdfsdf?=
, the original subject sent is æøsdåføsdf sdfsdf
(Norwegian characters in there). 我正在阅读的主题如下:
=?ISO-8859-1?Q?=E6=F8sd=E5f=F8sdf_sdfsdf?=
,发送的原始主题是æøsdåføsdf sdfsdf
(其中有挪威字符)。
Any ideas how I can change encoding or parse this correctly? 我有什么想法可以改变编码或正确解析它? So far I've tried to use the C# encoding conversion techniques to encode the subject to utf8, but without any luck.
到目前为止,我已经尝试使用C#编码转换技术将主题编码为utf8,但没有任何运气。
Here is one of the solutions I tried: 这是我尝试过的解决方案之一:
Encoding iso = Encoding.GetEncoding("iso-8859-1");
Encoding utf = Encoding.UTF8;
string decodedSubject =
utf.GetString(Encoding.Convert(utf, iso,
iso.GetBytes(m.Subject.Split('?')[3])));
The encoding is called quoted printable . 编码称为quoted printable 。
See the answers to this question. 查看此问题的答案。
Adapted from the accepted answer : 改编自已接受的答案 :
public string DecodeQuotedPrintable(string value)
{
Attachment attachment = Attachment.CreateAttachmentFromString("", value);
return attachment.Name;
}
When passed the string =?ISO-8859-1?Q?=E6=F8sd=E5f=F8sdf_sdfsdf?=
this returns "æøsdåføsdf_sdfsdf". 当传递字符串
=?ISO-8859-1?Q?=E6=F8sd=E5f=F8sdf_sdfsdf?=
这将返回“æøsdåføsdf_sdfsdf”。
public static string DecodeEncodedWordValue(string mimeString)
{
var regex = new Regex(@"=\?(?<charset>.*?)\?(?<encoding>[qQbB])\?(?<value>.*?)\?=");
var encodedString = mimeString;
var decodedString = string.Empty;
while (encodedString.Length > 0)
{
var match = regex.Match(encodedString);
if (match.Success)
{
// If the match isn't at the start of the string, copy the initial few chars to the output
decodedString += encodedString.Substring(0, match.Index);
var charset = match.Groups["charset"].Value;
var encoding = match.Groups["encoding"].Value.ToUpper();
var value = match.Groups["value"].Value;
if (encoding.Equals("B"))
{
// Encoded value is Base-64
var bytes = Convert.FromBase64String(value);
decodedString += Encoding.GetEncoding(charset).GetString(bytes);
}
else if (encoding.Equals("Q"))
{
// Encoded value is Quoted-Printable
// Parse looking for =XX where XX is hexadecimal
var regx = new Regex("(\\=([0-9A-F][0-9A-F]))", RegexOptions.IgnoreCase);
decodedString += regx.Replace(value, new MatchEvaluator(delegate(Match m)
{
var hex = m.Groups[2].Value;
var iHex = Convert.ToInt32(hex, 16);
// Return the string in the charset defined
var bytes = new byte[1];
bytes[0] = Convert.ToByte(iHex);
return Encoding.GetEncoding(charset).GetString(bytes);
}));
decodedString = decodedString.Replace('_', ' ');
}
else
{
// Encoded value not known, return original string
// (Match should not be successful in this case, so this code may never get hit)
decodedString += encodedString;
break;
}
// Trim off up to and including the match, then we'll loop and try matching again.
encodedString = encodedString.Substring(match.Index + match.Length);
}
else
{
// No match, not encoded, return original string
decodedString += encodedString;
break;
}
}
return decodedString;
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.