简体   繁体   中英

Why am I receiving an attribute error for my BeautifulSoup code when the variable in question has a value?

I am using Python 3.9.1 with selenium and BeatifulSoup in order to create my first webscraper for Tesco's website (a mini project to teach myself). However, when I run the code, as shown below, I receive an attribute error:

Traceback (most recent call last):
  File "c:\Users\Ozzie\Dropbox\My PC (DESKTOP-HFVRPAV)\Desktop\Tesco\Tesco.py", line 37, in <module>
    clean_product_data = process_products(html)
  File "c:\Users\Ozzie\Dropbox\My PC (DESKTOP-HFVRPAV)\Desktop\Tesco\Tesco.py", line 23, in process_products
    weight = product_price_weight.find("span",{"class":"weight"}).text.strip()
AttributeError: 'NoneType' object has no attribute 'find'

I am unsure what is going wrong - the title and URL sections work fine, but the weight and price sections return this value. When I have tried printing the product_price and product_price_weight variables, they have returned the values I expected them to (I won't post that here, it's just very long HTML).

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from webdriver_manager.chrome import ChromeDriverManager
import time
from bs4 import BeautifulSoup


driver = webdriver.Chrome(ChromeDriverManager().install())

def process_products(html):
    clean_product_list = []
    soup = BeautifulSoup(html, 'html.parser')
    products = soup.find_all("div",{"class":"product-tile-wrapper"})

    for product in products:
        data_dict = {}
        product_details = product.find("div",{"class":"product-details--content"})
        product_price = product.find("div",{"class":"price-control-wrapper"})
        product_price_weight = product.find("div",{"class":"price-per-quantity-weight"})

        data_dict['title'] = product_details.find('a').text.strip()
        data_dict['product_url'] = ('tesco.com') + (product_details.find('a')['href'])
        weight = product_price_weight.find("span",{"class":"weight"}).text.strip()
        data_dict['price'] = product_price.find("span",{"class":"value"}).text.strip()
        data_dict['price'+weight] = product_price_weight.find("span",{"class":"value"}).text.strip()
        clean_product_list.append(data_dict)
    return clean_product_list 


master_list = []

for i in range (1,3):
    print (i)
    driver.get(f"https://www.tesco.com/groceries/en-GB/shop/fresh-food/all?page={i}&count=48")
    html = driver.page_source
    driver.maximize_window()
    clean_product_data = process_products(html)
    master_list.extend(clean_product_data)

print (master_list)

Any help is much appreciated. Many thanks,

You can try this by updating your process_products function. Take note again THERE ARE CASES where some of your variable that you are trying to do a .find() returns a None which simply means that it HAS NOT find any element base on the parameters given on your .find() function.

Example this one:

Let's say this part of code has been executed

product_details = product.find("div",{"class":"product-details--content"})

Now if it finds an element based on those tags & class it will return a bs4 object but if not it will return None so let's say it returned None .

So your product_details variable will be a None object so once it is None again here on your code you do this. Again where product_details is None

data_dict['title'] = product_details.find('a').text.strip()
#Another way of saying is 
#data_dict['title'] = None.find('a').text.strip() ##Clearly an ERROR

So what I did this here is put it in a try except to simply catch those errors and give you empty strings indicating that probably your variable you're trying to do a .find() returns a None or might be some errors (the point is there is no relevant data being returned), that's why I use try except but you could also just make an if else out of this, but I think doing it in a try except is better.

def process_products(html):
    clean_product_list = []
    soup = BeautifulSoup(html, 'html.parser')
    products = soup.find_all("div",{"class":"product-tile-wrapper"})

    for product in products:
        data_dict = {}
        product_details = product.find("div",{"class":"product-details--content"})
        product_price = product.find("div",{"class":"price-control-wrapper"})
        product_price_weight = product.find("div",{"class":"price-per-quantity-weight"})

        try:
            data_dict['title'] = product_details.find('a').text.strip()
            data_dict['product_url'] = ('tesco.com') + (product_details.find('a')['href'])
        except BaseException as no_prod_details:
            '''
            This would mean that your product_details variable might be equal to None, so catching the error & setting
            yoour data with empty strings, indicating it can't do a .find()
            '''
            data_dict['title'] = ''
            data_dict['product_url'] = ''


        try:
            data_dict['price'] = product_price.find("span",{"class":"value"}).text.strip()

        except BaseException as no_prod_price:
            #Same here
            data_dict['price'] =''


        try:
            weight = product_price_weight.find("span",{"class":"weight"}).text.strip()
            data_dict['price'+weight] = product_price_weight.find("span",{"class":"value"}).text.strip()
        except BaseException as no_prod_price_weigth:
            #Same here again
            weight = ''
            data_dict['price'+weight] = ''



        clean_product_list.append(data_dict)




    return clean_product_list 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM