网页抓取循环 Python 和 Beautiful Soup 中的错误

Question

我试图从这个网站上抓取数据： https : //knowyourcity.info/explore-our-data/

我已将每个数据页的所有 URL 放入一个名为 urllist 的对象中，并编写了以下循环：

name = []
year = []
country = []
population = []
taps = []
toiletsToPerson = []

from requests import get
from bs4 import BeautifulSoup
for u in urllist:
    response = get(u)
    html_soup = BeautifulSoup(response.text, "html.parser")

for u in urllist:
    response = get(u)
    html_soup = BeautifulSoup(response.text, "html.parser")
    headers_containers = html_soup.find('div', class_ = 'settlement-base-status section text-center')
    names = headers_containers.h2.text
    name.append(names)
    year_established = headers_containers.h3.text
    year.append(year_established)
    headers1_containers = html_soup.find('div', class_ = 'col-xs-12 text-center')
    countries = headers1_containers.h4.a.text
    country.append(countries)
    headers2_containers = html_soup.find('div', class_ = 'bold-it', id = "population")
    populations = headers2_containers.text
    population.append(populations)
    headers3_containers = html_soup.find('div', class_ ='bold-it', id='sharedTaps')
    tap = headers3_containers.text
    taps.append(tap)
    headers4_containers = html_soup.find_all('div', class_ = 'bold-it')
    toiletSeat_toPerson = headers4_containers[7].text
    toiletsToPerson.append(toiletSeat_toPerson)

当我将这些命令用于单个 URL 时，它确实有效，但是当我尝试运行它时，出现错误：

  File "<ipython-input-472-0f7d711bfd3f>", line 5, in <module>
    names = headers_containers.h2.text

AttributeError: 'NoneType' object has no attribute 'h2'

为什么会这样？

Answer 1

您的 urllist 未在您给定的代码中定义，您确定这是正确的吗？ 您也可以使用 try + except 来处理解析错误

try:
    headers_containers = html_soup.find('div', class_ = 'settlement-base-status section text-center')
    names = headers_containers.h2.text
    name.append(names)
except:
    continue

网页抓取循环 Python 和 Beautiful Soup 中的错误

问题描述

1 个解决方案

解决方案1
0 2020-09-09 18:09:39

网页抓取循环 Python 和 Beautiful Soup 中的错误

问题描述

1 个解决方案

解决方案1 0 2020-09-09 18:09:39

解决方案1
0 2020-09-09 18:09:39