繁体   English   中英

网页抓取循环 Python 和 Beautiful Soup 中的错误

[英]Error in webscraping loop Python and Beautiful Soup

我试图从这个网站上抓取数据: https : //knowyourcity.info/explore-our-data/

我已将每个数据页的所有 URL 放入一个名为 urllist 的对象中,并编写了以下循环:

name = []
year = []
country = []
population = []
taps = []
toiletsToPerson = []

from requests import get
from bs4 import BeautifulSoup
for u in urllist:
    response = get(u)
    html_soup = BeautifulSoup(response.text, "html.parser")

for u in urllist:
    response = get(u)
    html_soup = BeautifulSoup(response.text, "html.parser")
    headers_containers = html_soup.find('div', class_ = 'settlement-base-status section text-center')
    names = headers_containers.h2.text
    name.append(names)
    year_established = headers_containers.h3.text
    year.append(year_established)
    headers1_containers = html_soup.find('div', class_ = 'col-xs-12 text-center')
    countries = headers1_containers.h4.a.text
    country.append(countries)
    headers2_containers = html_soup.find('div', class_ = 'bold-it', id = "population")
    populations = headers2_containers.text
    population.append(populations)
    headers3_containers = html_soup.find('div', class_ ='bold-it', id='sharedTaps')
    tap = headers3_containers.text
    taps.append(tap)
    headers4_containers = html_soup.find_all('div', class_ = 'bold-it')
    toiletSeat_toPerson = headers4_containers[7].text
    toiletsToPerson.append(toiletSeat_toPerson)

当我将这些命令用于单个 URL 时,它确实有效,但是当我尝试运行它时,出现错误:

  File "<ipython-input-472-0f7d711bfd3f>", line 5, in <module>
    names = headers_containers.h2.text

AttributeError: 'NoneType' object has no attribute 'h2'

为什么会这样?

您的 urllist 未在您给定的代码中定义,您确定这是正确的吗? 您也可以使用 try + except 来处理解析错误

try:
    headers_containers = html_soup.find('div', class_ = 'settlement-base-status section text-center')
    names = headers_containers.h2.text
    name.append(names)
except:
    continue

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM