简体   繁体   English

从Outlook 7位编码处理电子邮件会在输出中引起有趣的字符

[英]Processing email from outlook 7bit encoding causes funny characters in output

I am working on a project where I am building my own SMTP server. 我正在开发自己的SMTP服务器的项目。 (please nobody ask why or provide me with things like Postfix, I have my reasons). (请问没人问为什么,或者给我提供Postfix之类的东西,我有我的理由)。

It is mostly working fine except with Outlook there seems to be some problem with encoding of the data that I am encoding from Outlook. 除了使用Outlook以外,其他大多数情况下都可以正常工作,但从Outlook编码的数据似乎似乎存在编码问题。

I keep getting content as follows: 我不断获得如下内容:

<html xmlns:v=3D"urn:schemas-microsoft-com:vml" =
xmlns:o=3D"urn:schemas-microsoft-com:office:office" =
xmlns:w=3D"urn:schemas-microsoft-com:office:word" =

Instead of: 代替:

<html xmlns:v="urn:schemas-microsoft-com:vml" =
xmlns:o="urn:schemas-microsoft-com:office:office" =
xmlns:w="urn:schemas-microsoft-com:office:word" =

Notice the 3D isn't there on the valid content. 请注意,有效内容上没有3D。

I have a function that listens to the socket for SMTP data which looks like the following: 我有一个函数可以监听套接字中的SMTP数据,如下所示:

if (stream.CanRead)
                {

                    byte[] serverData = new byte[1024];
                    StringBuilder stringBuilder = new StringBuilder();
                    int numberOfBytesRead = 0;
                    do
                    {
                        numberOfBytesRead = stream.Read(serverData, 0, serverData.Length);
                        Encoding encoding = Encoding.GetEncoding("UTF-7", new FallbackEncoding(), new FallbackDecoding());
                        stringBuilder.AppendFormat("{0}", encoding.GetString(serverData, 0, numberOfBytesRead));
                    } while (stream.DataAvailable);

                    return stringBuilder.ToString();

In my FallbackDecoding function I have the following code 在我的FallbackDecoding函数中,我有以下代码

class FallbackDecoding : DecoderFallback
    {
        public override int MaxCharCount
        {
            get
            {
                return 1;
            }
        }

        public override DecoderFallbackBuffer CreateFallbackBuffer()
        {
            return new Buffer();
        }

        private class Buffer : DecoderFallbackBuffer
        {
            private int _fallbackIndex;
            private string _fallbackString;

            public override int Remaining
            {
                get
                {
                    return _fallbackString.Length - _fallbackIndex;
                }
            }

            public override bool Fallback(byte[] bytesUnknown, int index)
            {
                byte unknownChar = bytesUnknown[index];
                _fallbackString = Encoding.ASCII.GetString(new[] { (byte)(unknownChar & 127) });
                _fallbackIndex = 0;
                return true;
            }

            public override char GetNextChar()
            {
                if (Remaining > 0)
                {
                    return _fallbackString[_fallbackIndex++];
                }
                else
                {
                    return '\0';
                }
            }

            public override bool MovePrevious()
            {
                if (_fallbackIndex > 0)
                {
                    _fallbackIndex--;
                    return true;
                }
                return false;
            }
        }

For some reason the decoder fall back class is throwing an exception in the function public override bool Fallback . 出于某种原因,解码器后退类在public override bool Fallback函数中引发异常。 It throws an exception because bytesunknown only has 1 item in the array, but the index parameter is 128 so its throwing an index out of range exception but I have no idea why. 它会引发异常,因为bytesunknown在数组中只有1个项目,但是index参数是128,因此它抛出了索引超出范围的异常,但我不知道为什么。

I've tried changing ASCII to UTF-7 as Outlook sends the data in 7bit but it doesn't seem to make any difference. 我尝试将ASCII更改为UTF-7,因为Outlook以7位发送数据,但似乎没有任何区别。

Due to the HTML in the email I'm receiving, when I pass the email in, the formatting is wrong and sometimes I'll just get garbage in the email. 由于我收到的电子邮件中的HTML,当我传递电子邮件时,格式错误,有时我会在电子邮件中得到垃圾。

Update 更新

Full email headers as requested 根据要求提供完整的电子邮件标题

Message-ID: <000d01d0dc52$0c0d4690$2427d3b0$@chrisboard.co.uk>
MIME-Version: 1.0
Content-Type: multipart/alternative;
    boundary="----=_NextPart_000_000E_01D0DC5A.6DD24AD0"
X-Mailer: Microsoft Outlook 15.0
Thread-Index: AdDcUeHbbPyOUTipQ462DEYroR+DWg==
Content-Language: en-gb

This is a multipart message in MIME format.

------=_NextPart_000_000E_01D0DC5A.6DD24AD0
Content-Type: text/plain;
    charset="us-ascii"
Content-Transfer-Encoding: 7bit

This is the content of the message


------=_NextPart_000_000E_01D0DC5A.6DD24AD0
Content-Type: text/html;
    charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

<html xmlns:v=3D"urn:schemas-microsoft-com:vml" =
xmlns:o=3D"urn:schemas-microsoft-com:office:office" =
xmlns:w=3D"urn:schemas-microsoft-com:office:word" =
xmlns:m=3D"http://schemas.microsoft.com/office/2004/12/omml" =
xmlns=3D"http://www.w3.o

Quoted-printable and ASCII text with long lines and = 带引号的可打印和ASCII文本,带有长行和=

The html attachment is encoded using quoted-printable encoding . html附件使用quoted-printable encoding编码 Quoted-printable uses special 3 bytes sequences starting with = . Quoted-printable使用以=开头的特殊3字节序列。 Quoted printable encodes = as =3D . 引用的可打印编码= =3D It is the only printable ascii character (33-126) that must be encoded. 它是唯一必须编码的可打印ASCII字符(33-126)。

BTW = at end of line is also product of quoted-printable encoding. BTW =行尾也是带quoted-printable编码的乘积。 It "breaks" long lines. 它“折断”了长行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM