简体繁体中英

python beautifulsoup : lxml html.parser

原文 2016-06-20 23:34:46 8 2 python/ beautifulsoup/ lxml/ html-parser

I must use beautifulsoup, but i don't know which parser I have to take. I hesitate between lxml and html.parser, or why not both. How to know if a web page is lxml compliant ? How to know if a web page is html parser compliant ? Many thanks

2 answers

There is no silver bullet. Different HTML parsers behave differently and you should pick the one that works for your particular page. Works in this case basically means, that you can get to your desired data.

lxml parser is generally faster, html5lib is the most lenient one - this kind of difference would be relevant if you have a broken or non-well-formed HTML to parse. html.parser is built-in and can help to avoid extra dependencies, if this is a problem. Here is a related table that highlights the differences.

I've learned it the hard way. It's been killing me. I just couldn't figure out why the tag I wanted included something that wasn't in that tag. Turned out the html parser wasn't working correctly with that site. After hours of headache, I suddenly tried switching to lxml parser, and lo and behold... The unwated stuff was gone as it should have been!

Python BeautifulSoup html.parser not working

BeautifulSoup failed on html.parser

beautifulsoup html.parser error

BeautifulSoup: what's the difference between 'lxml' and 'html.parser' and 'html5lib' parsers?

What is the meaning of “html.parser” when doing BeautifulSoup(source_code, 'html.parser')?

How to add 'features="html.parser"' to the BeautifulSoup constructor

Unable to extract the contents of a web page using Beautifulsoup with html.parser

Parsing HTML in python3, re, html.parser, or something else?

ImportError: No module named 'html.parser'; 'html' is not a package (python3)

Python Beautiful Soup html.parser returns none

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Python BeautifulSoup html.parser not working BeautifulSoup failed on html.parser beautifulsoup html.parser error BeautifulSoup: what's the difference between 'lxml' and 'html.parser' and 'html5lib' parsers? What is the meaning of “html.parser” when doing BeautifulSoup(source_code, 'html.parser')? How to add 'features="html.parser"' to the BeautifulSoup constructor Unable to extract the contents of a web page using Beautifulsoup with html.parser Parsing HTML in python3, re, html.parser, or something else? ImportError: No module named 'html.parser'; 'html' is not a package (python3) Python Beautiful Soup html.parser returns none

Related Tags

python beautifulsoup : lxml html.parser

Question

2 answers

solution1
7 ACCPTED 2016-06-20 23:36:50

solution2
-1 2022-07-12 08:19:46

python beautifulsoup : lxml html.parser

Question

2 answers

solution1 7 ACCPTED 2016-06-20 23:36:50

solution2 -1 2022-07-12 08:19:46

solution1
7 ACCPTED 2016-06-20 23:36:50

solution2
-1 2022-07-12 08:19:46