繁体   English   中英

Python BeautifulSoup解析不起作用

[英]Python BeautifulSoup Parsing Not working

我主要使用python进行数据分析和新手抓取。 我正在尝试学习BeautifulSoup软件包。 我在使以下代码正常工作时遇到问题。

from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen('http://pythonscraping.com/pages/warandpeace.html')
bsobj = BeautifulSoup(html)
name_list = bsobj.findAll('span',{'class':'green'})

我得到一个空名单。

显然,问题出在第四行。 我不知道为什么。 这里的一切都是标准的。 我不知道出了什么问题。

bsobj.prettify() 

返回''

但是当我执行html.read()时,我可以看到所有的html代码都很好。 以下答案无法解决问题。 问题显然来自第4行。 我使用bsobj.findAll()还是bsobj.find_all()都没关系。 它们是等效的,正如我提到的,bsobj.prettify()返回''。

我认为该行应为bsobj = BeautifulSoup(html.read())

findall是错误的...

bsobj.find_all('span',{'class':'green'})

它返回

[<span class="green">Anna
 Pavlovna Scherer</span>, <span class="green">Empress Marya
 Fedorovna</span>, <span class="green">Prince Vasili Kuragin</span>, <span class="green">Anna Pavlovna</span>, <span class="green">St. Petersburg</span>, <span class="green">the prince</span>, <span class="green">Anna Pavlovna</span>, <span class="green">Anna Pavlovna</span>, <span class="green">the prince</span>, <span class="green">the prince</span>, <span class="green">the prince</span>, <span class="green">Prince Vasili</span>, <span class="green">Anna Pavlovna</span>, <span class="green">Anna Pavlovna</span>, <span class="green">the prince</span>, <span class="green">Wintzingerode</span>, <span class="green">King of Prussia</span>, <span class="green">le Vicomte de Mortemart</span>, <span class="green">Montmorencys</span>, <span class="green">Rohans</span>, <span class="green">Abbe Morio</span>, <span class="green">the Emperor</span>, <span class="green">the prince</span>, <span class="green">Prince Vasili</span>, <span class="green">Dowager Empress Marya Fedorovna</span>, <span class="green">the baron</span>, <span class="green">Anna Pavlovna</span>, <span class="green">the Empress</span>, <span class="green">the Empress</span>, <span class="green">Anna Pavlovna's</span>, <span class="green">Her Majesty</span>, <span class="green">Baron
 Funke</span>, <span class="green">The prince</span>, <span class="green">Anna
 Pavlovna</span>, <span class="green">the Empress</span>, <span class="green">The prince</span>, <span class="green">Anatole</span>, <span class="green">the prince</span>, <span class="green">The prince</span>, <span class="green">Anna
 Pavlovna</span>, <span class="green">Anna Pavlovna</span>]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM