[英]Error in webscraping loop Python and Beautiful Soup
我试图从这个网站上抓取数据: https : //knowyourcity.info/explore-our-data/
我已将每个数据页的所有 URL 放入一个名为 urllist 的对象中,并编写了以下循环:
name = []
year = []
country = []
population = []
taps = []
toiletsToPerson = []
from requests import get
from bs4 import BeautifulSoup
for u in urllist:
response = get(u)
html_soup = BeautifulSoup(response.text, "html.parser")
for u in urllist:
response = get(u)
html_soup = BeautifulSoup(response.text, "html.parser")
headers_containers = html_soup.find('div', class_ = 'settlement-base-status section text-center')
names = headers_containers.h2.text
name.append(names)
year_established = headers_containers.h3.text
year.append(year_established)
headers1_containers = html_soup.find('div', class_ = 'col-xs-12 text-center')
countries = headers1_containers.h4.a.text
country.append(countries)
headers2_containers = html_soup.find('div', class_ = 'bold-it', id = "population")
populations = headers2_containers.text
population.append(populations)
headers3_containers = html_soup.find('div', class_ ='bold-it', id='sharedTaps')
tap = headers3_containers.text
taps.append(tap)
headers4_containers = html_soup.find_all('div', class_ = 'bold-it')
toiletSeat_toPerson = headers4_containers[7].text
toiletsToPerson.append(toiletSeat_toPerson)
当我将这些命令用于单个 URL 时,它确实有效,但是当我尝试运行它时,出现错误:
File "<ipython-input-472-0f7d711bfd3f>", line 5, in <module>
names = headers_containers.h2.text
AttributeError: 'NoneType' object has no attribute 'h2'
为什么会这样?
您的 urllist 未在您给定的代码中定义,您确定这是正确的吗? 您也可以使用 try + except 来处理解析错误
try:
headers_containers = html_soup.find('div', class_ = 'settlement-base-status section text-center')
names = headers_containers.h2.text
name.append(names)
except:
continue
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.