简体   繁体   中英

Convert MHTML to HTML using C#

I was tasked to embed a mHtml into an email body. The issue is that mhtml is not a normal html file so I cannot embed it directly to the email.

How can I do to convert the mhtml into a html file?

Thanks

I found the solution on this link :

Original (Dead) Link

Archived Link

The solution was to extract the HTML encoded as Base64 inside the MHTML.

var decoded_text = new StringBuilder();
using (var reader = new StreamReader(mhtFile))
{
    while (!reader.EndOfStream)
    {
        var line = reader.ReadLine();
        if (line != "Content-Transfer-Encoding: base64") continue;

        reader.ReadLine(); //chew up the blank line
        while ((line = reader.ReadLine()) != String.Empty)
            if (line != null)
                decoded_text.Append(
                    Encoding.UTF8.GetString(
                        Convert.FromBase64String(line)));
        break;
    }
}

The accepted solution works fine when there is no diacritics letters in html (ěščřžýáíé - czech diacritics for example or other 2 bytes characters). If the first byte of such character is at the end of variable "line" and second at the beginning of next one, then not readable character is shown in html result.

        var base64_text = new StringBuilder();
        using (var reader = new StreamReader(mhtFile))
        {
            while (!reader.EndOfStream)
            {
                var line = reader.ReadLine();
                if (line != "Content-Transfer-Encoding: base64") continue;

                reader.ReadLine(); //chew up the blank line
                while ((line = reader.ReadLine()) != String.Empty)
                    if (line != null)
                        base64_text.Append(line);
                break;
            }
            return Encoding.UTF8.GetString(Convert.FromBase64String(base64_text.ToString()));
        }

I opened the .mhtml from this page in a text editor (notepad++), the HTML appears to be in the file, intact. You have to scroll way down past all the CSS. I would just create something to extract the HTML text from within the file rather than deal with the base64 data (too confusing for me if something doesn't work right).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM