简体   繁体   中英

extract text from mht

I have an mht file, I wish to get all the text of the mht. I tought about using regex, but I have other languages in the mht except english, so the text itself contains stuff like A7=A98=D6...

select all the text of a file viewed in your browser, and then copy and paste it into a notepad - this is what i need.

Thanks.

Open the file in Internet Explorer and save it as plain text (UTF-8). :) If you need an automated solution, look for an mht to txt converter for your platform or programming language.

Actually, you can automate this in Powershell as well:

$ie = New-Object -ComObject "InternetExplorer.Application"
$ie.Navigate2("file:///C:/MyFile.mht")
$text = $ie.Document.documentElement.innerText

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM