For a translation program I am trying to get a 95% accurate text from a HTML file in order to translate the sentences and links.
For example:
<div><a href="stack">Overflow</a> <span>Texts <b>go</b> here</span></div>
Should give me 2 results to translate:
Overflow
Texts <b>go</b> here
Any suggestions or commercial packages available for this problem?
I'm not exactly sure what you're asking, but look at simplehtmldom . Specifically the "Extract Contents from HTML" tab under quick start on that front page (can't link directly, sigh ). With that you can extract the text of a website without all those pesky tags.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.