简体   繁体   English

C#从MHT文件提取HTML

[英]C# extract HTML from MHT file

I have a C# module that extracts information from a HTML file. 我有一个C#模块,可从HTML文件中提取信息。 But my input is a MHT file. 但是我的输入是一个MHT文件。 How do I go about extracting just the html portion of the MHT file? 我该如何只提取MHT文件的html部分?

I tried several tools & libraries that reportedly allowed me to extract the contents of a MHT, but almost all failed (I found that the provider of the MHT files did not encode some types correctly). 我尝试了几种工具和库,据说这些工具和库允许我提取MHT的内容,但几乎都失败了(我发现MHT文件的提供程序无法正确编码某些类型)。 I eventually discovered Total Commander which let me unpack the MHT and extract just the html portion. 我最终发现了Total Commander,它使我可以拆开MHT的包装并仅提取html部分。 It was a hack, but it got the job done. 这是一个hack,但是它完成了工作。

It would seem that there are many tools for creating MHTs and few for unpacking them. 似乎有很多用于创建MHT的工具,而很少用于解压缩它们的工具。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM