简体   繁体   中英

Error in python loop while trying scraping with beautifulsoup

Hi i'm new to web scraping and i'm trying to follow a tutorial but i have issues accessing certain items! This is the page i want to scrape https://www.newegg.com/todays-deals?cm_sp=Homepage_4spots-_--_-12182020 , and I want to get the title, brand and the price of the product, everything works fine outside of the loop! but i have errors while creating the loop for all the products

#this is  the loop to scrape all items from the webpage
containers = pagesoup.findAll("div",{"class":"item-container"})
for con in containers:
  title = con.img["title"]
  titleco=con.findAll("div",{"class":"item-branding"})
  brand= titleco[0].img["title"]
  priceco=con.findAll("li",{"class":"price-current"})
  priceco[0].text.strip()

i get this error

----> 5 brand= titleco[0].img["title"] 'NoneType' object is not subscriptable

Not every item-branding item on your page has an img : so in some cases, titleco[0].img is None , hence why you get an error when trying to access the "title" element.

You run into another issue later with your price-current , too: sometimes you find zero matches, hence you get an error when trying to access the first element of the ResultSet via priceco[0] . Or at least I do, but your site seems to be partially unavailable to my country, so you may not get the same results.

Here's a version of your code that runs:

containers = pagesoup.findAll("div", {"class": "item-container"})
for con in containers:
    title = con.img["title"]
    titleco = con.findAll("div", {"class": "item-branding"})
    if titleco[0].img != None:
        brand = titleco[0].img["title"]
    priceco = con.findAll("li", {"class": "price-current"})
    if len(priceco) > 0:
        priceco[0].text.strip()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM