I have a fixed well structured html source, incoming data is clear and small, just contains a little list of divs. I know that using a html parser for html parsing but this looks like a particular case and i am not sure which one that i should use. The problem conditions below
Any opinion is valuable so what should I do?
I would still stick to using an HTML Parser, because, at least, there is a specific data format and a specialized tool that understands the format.
If performance matters here, there is a blazingly fast lxml
package. For the HTML, use lxml.html
.
You can also use an awesome BeautifulSoup
package and let it use lxml
parser under-the-hood . Besides, if the data you need to parse is in a specific part of the HTML document, you can have a performance gain by asking BeautifulSoup
to parse only the relevant part of the HTML document, see more at: Parsing only part of a document .
And, to follow the tradition for HTML+regex threads, here is the reference to the famous topic covering the reasons why you should not use regex for parsing HTML:
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.