简体   繁体   中英

Attribute error in web scraping python bs4

I have written a python code for web scraping and its seems everything fine but when I run this code I receive an "AttributeError: 'NoneType' object has no attribute 'text'" so please have a look and guide me on how can I fix this type of error. thanks

here is my code:

import pandas
from bs4 import BeautifulSoup
import requests

headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36'}
url = 'https://www.realtor.com/realestateandhomes-search/Orlando_FL/dom-1'

r = requests.get(url, headers=headers)
soup = BeautifulSoup(r.content, 'html.parser')
linklist = []
urls = soup.find_all('div', class_ = 'jsx-4195823209 photo-wrap')
for url in urls:
    for link in url.find_all('a', href=True):
        linklist.append('https://www.realtor.com' + link['href'])
#print(linklist)

testurl = 'https://www.realtor.com/realestateandhomes-detail/127-W-Wallace-St_Orlando_FL_32809_M62756-65861'

r = requests.get(testurl, headers=headers)
soup = BeautifulSoup(r.content, 'html.parser')
address = soup.find('div', class_='jsx-1959108432 address-section').h1.text
print(address)
name = soup.find('a', class_ = 'jsx-725757796 agent-name').text.strip()
print(street)

The likely issue is with either of these lines:

address = soup.find('div', class_='jsx-1959108432 address-section').h1.text
name = soup.find('a', class_ = 'jsx-725757796 agent-name').text.strip()

You're trying to access attributes of an object that you have no guarantee is not a NoneType every time the script runs

You have alternatives such as using a try/except block like this

try:
    address = soup.find('div', class_='jsx-1959108432 address-section').h1.text
except Exception as e:
    print('The following error occurred getting the text from the address: %r', e)

This method of exception handling is generic; you can be more specific like this:

try:
    address = soup.find('div', class_='jsx-1959108432 address-section').h1.text
except AttributeError:
    print('Could not get text from address')

Essentially you need some validation for things such as:

  • What if the request fails?
  • What if the classnames on the web page change/don't exist

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM