简体   繁体   中英

Parse htmltags from a string java/gwt

Hey I want to parse some data from html that I get sent to me in a string. I the data i want is in UPPERCASE and i will name it DATAx here. The lenght of the data is arbitrary.

http://pastebin.mozilla.org/1270216

there are many more lines like this that i have to parse.

thx for answears!

I've had great luck with jsoup . It uses a jQuery style dom node selector and can work with HTML fragments, even very poorly formatted ones.

I don't know about jsoup, but TagSoup is a fantastic HTML-parsing library. I've had it in a production system for a couple of years now which has been run against tens of thousands (at least) of web pages in the wild, and we've never had a single failure from TagSoup. It handles even the most horribly formatted HTML imaginable.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM