[英]Python: AttributeError: 'NoneType' object has no attribute 'findNext'
I am trying to scrape a website with BeautifulSoup but am having a problem.我正在尝试使用 BeautifulSoup 抓取网站,但遇到了问题。 I was following a tutorial done in python 2.7 and it had exactly the same code in it and had no problems.
我正在学习在 python 2.7 中完成的教程,其中包含完全相同的代码并且没有任何问题。
import urllib.request
from bs4 import *
htmlfile = urllib.request.urlopen("http://en.wikipedia.org/wiki/Steve_Jobs")
htmltext = htmlfile.read()
soup = BeautifulSoup(htmltext)
title = (soup.title.text)
body = soup.find("Born").findNext('td')
print (body.text)
If I try to run the program I get,如果我尝试运行我得到的程序,
Traceback (most recent call last):
File "C:\Users\USER\Documents\Python Programs\World Population.py", line 13, in <module>
body = soup.find("Born").findNext('p')
AttributeError: 'NoneType' object has no attribute 'findNext'
Is this a problem with python 3 or am i just too naive?这是python 3的问题还是我太天真了?
The find
and find_all
methods do not search for arbitrary text in the document, they search for HTML tags. find
和find_all
方法不会搜索文档中的任意文本,而是搜索HTML 标签。 The documentation makes that clear (my italics):文档清楚地说明了这一点(我的斜体):
Pass in a value for name and you'll tell Beautiful Soup to only consider tags with certain names.传入 name 的值,您将告诉 Beautiful Soup 仅考虑具有特定名称的标签。 Text strings will be ignored, as will tags whose names that don't match.
文本字符串将被忽略,名称不匹配的标签也将被忽略。 This is the simplest usage:
这是最简单的用法:
soup.find_all("title")
# [<title>The Dormouse's story</title>]
That's why your soup.find("Born")
is returning None
and hence why it complains about NoneType
(the type of None
) having no findNext()
method.这就是为什么您的
soup.find("Born")
返回None
原因,因此它抱怨NoneType
( None
的类型)没有findNext()
方法。
That page you reference contains (at the time this answer was written) eight copies of the word "born", none of which are tags.您引用的页面包含(在撰写此答案时)“出生”一词的八个副本,其中没有一个是标签。
Looking at the HTML source for that page, you'll find the best option may be to look for the correct span (formatted for readabilty):查看该页面的 HTML 源代码,您会发现最好的选择可能是寻找正确的跨度(格式化为可读性):
<th scope="row" style="text-align: left;">Born</th>
<td>
<span class="nickname">Steven Paul Jobs</span><br />
<span style="display: none;">(<span class="bday">1955-02-24</span>)</span>February 24, 1955<br />
</td>
The find
method looks for tags, not text. find
方法查找标签,而不是文本。 To find the name, birthday and birthplace, you would have to look up the span
elements with the corresponding class name, and access the text
attribute of that item:要查找姓名、生日和出生地,您必须查找具有相应类名的
span
元素,并访问该项目的text
属性:
import urllib.request
from bs4 import *
soup = BeautifulSoup(urllib.request.urlopen("http://en.wikipedia.org/wiki/Steve_Jobs"))
title = soup.title.text
name = soup.find('span', {'class': 'nickname'}).text
bday = soup.find('span', {'class': 'bday'}).text
birthplace = soup.find('span', {'class': 'birthplace'}).text
print(name)
print(bday)
print(birthplace)
Output:输出:
Steven Paul Jobs
1955-02-24
San Francisco, California, US
PS: You don't have to call read
on urlopen
, BS accept file-like objects. PS:您不必在
urlopen
上调用read
,BS 接受类文件对象。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.