简体   繁体   English

从MHT文档中提取内容

[英]Extracting Content from MHT Document

Is anybody aware of any libraries for working with MHT files ( Multi-Part MIME files ) in .NET? 是否有人知道在.NET中使用MHT文件( 多部分MIME文件 )的任何库? I need to programmatically extract the contents from an existing MHT file containing a Flash website. 我需要以编程方式从包含Flash网站的现有MHT文件中提取内容。 I haven't been able to locate any such libraries. 我找不到任何这样的库。

Also, if there's a native way in .NET that I'm not aware of, please feel free to let me know. 另外,如果在.NET中有一种我不知道的原生方式,请随时告诉我。

EDIT: I know that the MailMessage class supports multi-part MIME messages through the AlternateViews property . 编辑:我知道MailMessage类通过AlternateViews属性支持多部分MIME消息。 The AlternateView class represents the alternative views in a multi-part MIME message. AlternateView类表示多部分MIME消息中的备用视图。 I'd like to believe that it's possible to use this knowledge to build something using code native to the .NET framework. 我想相信可以使用这些知识使用.NET框架本机代码构建一些东西。 I just haven't been able to find out the right combination to make it work, so I'm starting to loose faith. 我只是找不到合适的组合才能使它发挥作用,所以我开始失去信心。 Does anybody out there know if it's possible to extract the contents of a MHT file through the AlternateView and other related classes? 有没有人知道是否可以通过AlternateView和其他相关类提取MHT文件的内容? For example, it would be nice if it were possible to create an instance of the MailMessage class from a Stream. 例如,如果可以从Stream创建MailMessage类的实例,那就太好了。

http://www.lumisoft.ee/lswww/ENG/Products/Mail_Server/mail_index_eng.aspx?type=info http://www.lumisoft.ee/lswww/ENG/Products/Mail_Server/mail_index_eng.aspx?type=info

This is open source email server which has good Mime Parser. 这是一个开源的电子邮件服务器,它有很好的Mime Parser。

You might be interessed in my MIME parsing project at github (written in C#) 您可能会在我的github上的MIME解析项目中介入(用C#编写)

https://github.com/smithimage/MIMER/ https://github.com/smithimage/MIMER/

Also has a Nuget package: 还有一个Nuget包:

https://nuget.org/packages/MIMER/ https://nuget.org/packages/MIMER/

David benko did a great job here his github project. David benko 他的github项目中做得很好。 I recently faced a this issue where I had an mhtml that needed to be converted to HTML file. 我最近遇到了一个问题,我有一个需要转换为HTML文件的mhtml。 for that I used HTMLAgility pack dll to extract content from the mhtml file and feed in this content to David's provided library: 为此,我使用HTMLAgility pack dll从mhtml文件中提取内容,并将此内容提供给David提供的库:

string filePath = @"D:\Temp\myfile.mhtml";
var doc = new HtmlDocument();
doc.Load(filePath);
string mhtml = doc.DocumentNode.OuterHtml;
MHTMLParser parser = new MHTMLParser(mhtml);
string htmlContent = parser.getHTMLText();
System.IO.File.WriteAllText(@"D:\Temp\file.html", htmlContent);

I would really appreciate if someone could verify this approach. 如果有人能够验证这种方法我真的很感激。 Cheers Vaqar 干杯Vaqar

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM