[英]Beautiful Soup error
I'm using the beautiful soup module to scrape the title of a list of web pages saved in a csv. 我正在使用漂亮的汤模块来刮取保存在csv中的网页列表的标题。 The script appears to work fine, but once it reaches the 82nd domain it produces the following error:
该脚本似乎工作正常,但一旦到达第82个域,它会产生以下错误:
Traceback (most recent call last):
File "soup.py", line 31, in <module>
print soup.title.renderContents() # 'Google'
AttributeError: 'NoneType' object has no attribute 'renderContents'
I'm fairly new to python so I'm not sure I understand the error, would anyone be able to clarify what's going wrong? 我对python很新,所以我不确定我是否理解错误,是否有人能够澄清出现了什么问题?
my code is: 我的代码是:
import csv
import socket
from urllib2 import Request, urlopen, URLError, HTTPError
from BeautifulSoup import BeautifulSoup
debuglevel = 0
timeout = 5
socket.setdefaulttimeout(timeout)
domains = csv.reader(open('domainlist.csv'))
f = open ('souput.txt', 'w')
for row in domains:
domain = row[0]
req = Request(domain)
try:
html = urlopen(req).read()
print domain
except HTTPError, e:
print 'The server couldn\'t fulfill the request.'
print 'Error code: ', e.code
except URLError, e:
print 'We failed to reach a server.'
print 'Reason: ', e.reason
else:
# everything is fine
soup = BeautifulSoup(html)
print soup.title # '<title>Google</title>'
print soup.title.renderContents() # 'Google'
f.writelines(domain)
f.writelines(" ")
f.writelines(soup.title.renderContents())
f.writelines("\n")
What if a page doesn't have a title??? 如果页面没有标题怎么办?
I had this problem once....just put the code in try except or check for a title. 我曾经遇到过这个问题....只是把代码放在try中,或者检查标题。
As maozet said, your problem is that title is None, you can check for that value to avoid the issue like this: 正如maozet所说,你的问题是标题是无,你可以检查该值以避免这样的问题:
soup = BeautifulSoup(html)
if soup.title != None:
print soup.title # '<title>Google</title>'
print soup.title.renderContents() # 'Google'
f.writelines(domain)
f.writelines(" ")
f.writelines(soup.title.renderContents())
f.writelines("\n")
I was facing the same problem but reading a couple of related questions and googling helped me through. 我遇到了同样的问题,但阅读了几个相关的问题和谷歌搜索帮助我完成了。 Here is what i would suggest to handle specific errors such as NoneType:
以下是我建议处理特定错误,如NoneType:
soup = BeautifulSoup(urllib2.urlopen('http://webpage.com').read())
scrapped = soup.find(id='whatweseekfor')
if scrapped == None:
# command when encountering an error eg: print none
elif scrapped != None:
# command when there is no None type error eg: print scrapped.get_text()
Good luck! 祝好运!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.