简体   繁体   English

使用 C# 将 MHTML 转换为 HTML

[英]Convert MHTML to HTML using C#

I was tasked to embed a mHtml into an email body.我的任务是将 mHtml 嵌入到电子邮件正文中。 The issue is that mhtml is not a normal html file so I cannot embed it directly to the email.问题是 mhtml 不是普通的 html 文件,因此我无法将其直接嵌入到电子邮件中。

How can I do to convert the mhtml into a html file?如何将 mhtml 转换为 html 文件?

Thanks谢谢

I found the solution on this link :我在此链接上找到了解决方案:

Original (Dead) Link 原始(死)链接

Archived Link 存档链接

The solution was to extract the HTML encoded as Base64 inside the MHTML.解决方案是在 MHTML 中提取编码为 Base64 的 HTML。

var decoded_text = new StringBuilder();
using (var reader = new StreamReader(mhtFile))
{
    while (!reader.EndOfStream)
    {
        var line = reader.ReadLine();
        if (line != "Content-Transfer-Encoding: base64") continue;

        reader.ReadLine(); //chew up the blank line
        while ((line = reader.ReadLine()) != String.Empty)
            if (line != null)
                decoded_text.Append(
                    Encoding.UTF8.GetString(
                        Convert.FromBase64String(line)));
        break;
    }
}

The accepted solution works fine when there is no diacritics letters in html (ěščřžýáíé - czech diacritics for example or other 2 bytes characters).当 html 中没有变音符号字母(ěščřžýáíé - 例如捷克语变音符号或其他 2 字节字符)时,可接受的解决方案工作正常。 If the first byte of such character is at the end of variable "line" and second at the beginning of next one, then not readable character is shown in html result.如果此类字符的第一个字节在变量“行”的末尾,第二个字节在下一个的开头,则 html 结果中将显示不可读的字符。

        var base64_text = new StringBuilder();
        using (var reader = new StreamReader(mhtFile))
        {
            while (!reader.EndOfStream)
            {
                var line = reader.ReadLine();
                if (line != "Content-Transfer-Encoding: base64") continue;

                reader.ReadLine(); //chew up the blank line
                while ((line = reader.ReadLine()) != String.Empty)
                    if (line != null)
                        base64_text.Append(line);
                break;
            }
            return Encoding.UTF8.GetString(Convert.FromBase64String(base64_text.ToString()));
        }

I opened the .mhtml from this page in a text editor (notepad++), the HTML appears to be in the file, intact.我在文本编辑器(记事本++)中从此页面打开了 .mhtml,HTML 似乎在文件中,完好无损。 You have to scroll way down past all the CSS.您必须向下滚动浏览所有 CSS。 I would just create something to extract the HTML text from within the file rather than deal with the base64 data (too confusing for me if something doesn't work right).我只是创建一些东西来从文件中提取 HTML 文本,而不是处理 base64 数据(如果某些东西不起作用,对我来说太混乱了)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM