繁体   English   中英

网络抓取python bs4中的属性错误

[英]Attribute error in web scraping python bs4

我已经编写了一个用于网络抓取的 python 代码,它似乎一切正常,但是当我运行此代码时,我收到一个“AttributeError: 'NoneType' object has no attribute 'text'”所以请查看并指导我如何修复这种类型的错误。 谢谢

这是我的代码:

import pandas
from bs4 import BeautifulSoup
import requests

headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36'}
url = 'https://www.realtor.com/realestateandhomes-search/Orlando_FL/dom-1'

r = requests.get(url, headers=headers)
soup = BeautifulSoup(r.content, 'html.parser')
linklist = []
urls = soup.find_all('div', class_ = 'jsx-4195823209 photo-wrap')
for url in urls:
    for link in url.find_all('a', href=True):
        linklist.append('https://www.realtor.com' + link['href'])
#print(linklist)

testurl = 'https://www.realtor.com/realestateandhomes-detail/127-W-Wallace-St_Orlando_FL_32809_M62756-65861'

r = requests.get(testurl, headers=headers)
soup = BeautifulSoup(r.content, 'html.parser')
address = soup.find('div', class_='jsx-1959108432 address-section').h1.text
print(address)
name = soup.find('a', class_ = 'jsx-725757796 agent-name').text.strip()
print(street)

可能的问题在于以下任一行:

address = soup.find('div', class_='jsx-1959108432 address-section').h1.text
name = soup.find('a', class_ = 'jsx-725757796 agent-name').text.strip()

您正在尝试访问无法保证每次脚本运行时都不是NoneType的对象的属性

您有其他选择,例如使用像这样的 try/except 块

try:
    address = soup.find('div', class_='jsx-1959108432 address-section').h1.text
except Exception as e:
    print('The following error occurred getting the text from the address: %r', e)

这种异常处理方法是通用的; 你可以像这样更具体:

try:
    address = soup.find('div', class_='jsx-1959108432 address-section').h1.text
except AttributeError:
    print('Could not get text from address')

本质上,您需要对以下内容进行一些验证:

  • 如果请求失败怎么办?
  • 如果网页上的类名更改/不存在怎么办

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM