繁体 English 中英

BeautifulSoup：'lxml' 和 'html.parser' 和 'html5lib' 解析器有什么区别？

[英]BeautifulSoup: what's the difference between 'lxml' and 'html.parser' and 'html5lib' parsers?

原文 2017-08-03 21:06:02 2 2 python/ html/ web-scraping/ beautifulsoup/ lxml

使用 Beautiful Soup 时，“lxml”和“html.parser”和“html5lib”有什么区别？

你什么时候会使用一个而不是另一个以及每个的好处？ 当我使用它们时，它们似乎可以互换，但这里的人纠正我，我应该使用不同的。 我想加强我的理解； 我已经在这里阅读了几篇关于此的帖子，但他们根本没有过多地讨论用途。

示例：

soup = BeautifulSoup(response.text, 'lxml')

2 个解决方案

从文档的优点和缺点汇总表：

html.parser - BeautifulSoup(markup, "html.parser")
- 优点：包括电池，速度不错，宽松（从 Python 2.7.3 和 3.2 开始。）
- 缺点：不是很宽容（在 Python 2.7.3 或 3.2.2 之前）
lxml - BeautifulSoup(markup, "lxml")
- 优点：非常快，宽大
- 缺点：外部 C 依赖
html5lib - BeautifulSoup(markup, "html5lib")
- 优点：极其宽松，以与 Web 浏览器相同的方式解析页面，创建有效的 HTML5
- 缺点：非常慢，外部 Python 依赖

BeautifulSoup 文档中突出显示了主要区别：

解析器之间的差异

为什么您更喜欢一个解析器而不是其他解析器的基本推理：

html.parser - 内置 -不需要额外的依赖
html5lib -最宽松的- 如果 HTML 损坏，最好使用它
lxml -最快的

在beautifulsoup的上下文中lxml和html5lib之间的区别

[英]difference between lxml and html5lib in the context of beautifulsoup

BeautifulSoup - lxml和html5lib解析器刮取差异

[英]BeautifulSoup - lxml and html5lib parsers scraping differences

python beautifulsoup：lxml html.parser

[英]python beautifulsoup : lxml html.parser

用于BeautifulSoup用户的html5lib / lxml示例？

[英]html5lib/lxml examples for BeautifulSoup users?

BeautifulSoup 在 html.parser 上失败

[英]BeautifulSoup failed on html.parser

beautifulsoup html.parser错误

[英]beautifulsoup html.parser error

做BeautifulSoup（source_code，'html.parser'）时“ html.parser”是什么意思？

[英]What is the meaning of “html.parser” when doing BeautifulSoup(source_code, 'html.parser')?

BeautifulSoup无法使用`html5lib`解析html

[英]BeautifulSoup fails to parse html with `html5lib`

Python BeautifulSoup html.parser无法正常工作

[英]Python BeautifulSoup html.parser not working

解析源代码（Python）方法：Beautiful Soup，lxml，html5lib有何区别？

[英]Parsing Source Code (Python) Approach: Beautiful Soup, lxml, html5lib difference?

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在beautifulsoup的上下文中lxml和html5lib之间的区别 BeautifulSoup - lxml和html5lib解析器刮取差异 python beautifulsoup：lxml html.parser 用于BeautifulSoup用户的html5lib / lxml示例？ BeautifulSoup 在 html.parser 上失败 beautifulsoup html.parser错误做BeautifulSoup（source_code，'html.parser'）时“ html.parser”是什么意思？ BeautifulSoup无法使用`html5lib`解析html Python BeautifulSoup html.parser无法正常工作解析源代码（Python）方法：Beautiful Soup，lxml，html5lib有何区别？

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM