網絡抓取python bs4中的屬性錯誤

Question

我已經編寫了一個用於網絡抓取的 python 代碼，它似乎一切正常，但是當我運行此代碼時，我收到一個“AttributeError: 'NoneType' object has no attribute 'text'”所以請查看並指導我如何修復這種類型的錯誤。謝謝

這是我的代碼：

import pandas
from bs4 import BeautifulSoup
import requests

headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36'}
url = 'https://www.realtor.com/realestateandhomes-search/Orlando_FL/dom-1'

r = requests.get(url, headers=headers)
soup = BeautifulSoup(r.content, 'html.parser')
linklist = []
urls = soup.find_all('div', class_ = 'jsx-4195823209 photo-wrap')
for url in urls:
    for link in url.find_all('a', href=True):
        linklist.append('https://www.realtor.com' + link['href'])
#print(linklist)

testurl = 'https://www.realtor.com/realestateandhomes-detail/127-W-Wallace-St_Orlando_FL_32809_M62756-65861'

r = requests.get(testurl, headers=headers)
soup = BeautifulSoup(r.content, 'html.parser')
address = soup.find('div', class_='jsx-1959108432 address-section').h1.text
print(address)
name = soup.find('a', class_ = 'jsx-725757796 agent-name').text.strip()
print(street)

Answer 1

可能的問題在於以下任一行：

address = soup.find('div', class_='jsx-1959108432 address-section').h1.text
name = soup.find('a', class_ = 'jsx-725757796 agent-name').text.strip()

您正在嘗試訪問無法保證每次腳本運行時都不是NoneType的對象的屬性

您有其他選擇，例如使用像這樣的 try/except 塊

try:
    address = soup.find('div', class_='jsx-1959108432 address-section').h1.text
except Exception as e:
    print('The following error occurred getting the text from the address: %r', e)

這種異常處理方法是通用的； 你可以像這樣更具體：

try:
    address = soup.find('div', class_='jsx-1959108432 address-section').h1.text
except AttributeError:
    print('Could not get text from address')

本質上，您需要對以下內容進行一些驗證：

如果請求失敗怎么辦？
如果網頁上的類名更改/不存在怎么辦

網絡抓取python bs4中的屬性錯誤

問題描述

1 個解決方案

解決方案1
0 2020-11-21 05:39:47

網絡抓取python bs4中的屬性錯誤

問題描述

1 個解決方案

解決方案1 0 2020-11-21 05:39:47

解決方案1
0 2020-11-21 05:39:47