美丽的汤findAll不计算所有div

Question

from bs4 import BeautifulSoup

html = 'index.html'
soup = BeautifulSoup(open(html))
print len(soup.findAll('div'))

where the file index.html is the source code of this shopping webpage . 其中index.html文件是此购物网页的源代码。

My code shows that only 1 div tag was found. 我的代码显示仅找到1个div标签。 But what's weirder is findAll('a') returns a huge (so probably correct) number. 但是奇怪的是findAll('a')返回一个巨大的（所以可能是正确的）数字。 span works etc, but not div . span工程等，但不是div 。

Answer 1

You are experiencing the differences between parsers that BeautifulSoup uses under-the-hood. 您正在体验BeautifulSoup 在后台使用的解析器之间的差异。

Choose either html.parser , or html5lib : 选择html.parser或html5lib ：

>>> from bs4 import BeautifulSoup
>>> html = 'index.html'
>>> soup = BeautifulSoup(open(html), 'html')
>>> len(soup.findAll('div'))
0
>>> soup = BeautifulSoup(open(html), 'lxml')
>>> len(soup.findAll('div'))
0
>>> soup = BeautifulSoup(open(html), 'html.parser')
>>> len(soup.findAll('div'))
774
>>> soup = BeautifulSoup(open(html), 'html5lib')
>>> Alen(soup.findAll('div'))
774

Note that if you don't specify a parser , BeautifulSoup would pick it up automatically: 请注意，如果您未指定解析器， BeautifulSoup会自动将其拾取：

If you don't specify anything, you'll get the best HTML parser that's installed. 如果不指定任何内容，则将获得已安装的最佳HTML解析器。 Beautiful Soup ranks lxml's parser as being the best, then html5lib's, then Python's built-in parser. Beautiful Soup将lxml的解析器评为最佳，然后是html5lib的解析器，然后是Python的内置解析器。

美丽的汤findAll不计算所有div

问题描述

1 个解决方案

解决方案1
1 2014-12-07 06:10:20

美丽的汤findAll不计算所有div

问题描述

1 个解决方案

解决方案1 1 2014-12-07 06:10:20

解决方案1
1 2014-12-07 06:10:20