简体   繁体   中英

Having issues with my first python web scraper

I am trying to scrape information on a website called supremecommunity for a personal project. They have an archive of items from different seasons and I am trying to put that information into a csv. My code includes a loop that I got from this github page https://github.com/CharlieAIO/Supreme-Community-Scraper/blob/master/sup.py

below is the code that I am working with, it will run without error but the csv remains empty save for the headers that I set. Am I doing something wrong here? Is the website denying my request? Any help or direction appreciated.

import requests
from bs4 import BeautifulSoup as bs

urls = ['https://www.supremecommunity.com/season/fall-winter2011/overview/']
open("SupremeData.csv","w")

filename = "SupremeData.csv"

headers = "Item,Image,Price,Upvotes,Downvotes"

f = open(filename, "w")
f.write(headers)


for link in urls:

    r = requests.get(link)
    soup = bs(r.text,"html.parser")
    cards = soup.find_all('div',{'class':'card card-2'})

    for card in cards:
        item = card.find("div",{"class":"card-details"})["data-itemname"]
        img = card.find("img",{"class":"prefill-img"})["src"]
        image = f'https://supremecommunity.com{img}'
        price = card.find("span",{"class":"label-price"}).text
        price = price.replace(" ","")
        price = price.replace("\n","")
        upvotes = card.find("p",{"class":"upvotes hidden"}).text
        downvotes = card.find("p",{"class":"downvotes hidden"}).text

        f.write(item + "," + image + "," + price + "," + upvotes + "," + downvotes + "\n")


f.close()    

this code save the csv file with some detail of product.

import requests
from bs4 import BeautifulSoup as bs

urls = ['https://www.supremecommunity.com/season/fall-winter2011/overview/']
open("SupremeData.csv","w")

filename = "SupremeData.csv"

headers = "Item,Image,Price,Upvotes,Downvotes"

f = open(filename, "w")
f.write(headers)


for link in urls:

    r = requests.get(link)
    soup = bs(r.content,"html.parser")
    #print(soup)
    cards = soup.find_all('div',{'class':'card-2'})
    #print(cards)
    #print(len(cards))
    for card in cards:
        item = card.find("div",{"class":"card__top"})["data-itemname"]
        img = card.find("img",{"class":"prefill-img"})["src"]
        image = f'https://supremecommunity.com{img}'
        try :
            price = card.find("span",{"class":"label-price"}).text
            price = price.replace(" ","")
            price = price.replace("\n","")
        except :
            price = 'Not Available'
        try :
            upvotes = card.find("p",{"class":"upvotes hidden"}).text
            downvotes = card.find("p",{"class":"downvotes hidden"}).text
        except:
            upvotes = 'Not Found'
            downvotes = 'Not Found'
        print((item + "," + image + "," + price + "," + upvotes + "," + downvotes + "\n"))
        f.write(item + "," + image + "," + price + "," + upvotes + "," + downvotes + "\n")


f.close()   

This is one of the ways you can fetch all the aforementioned fields from that webpage other than the prices as they are not available in there yet. To check it, you can click on each image and toggle the price tab. I've used .select() instead of .find_all() to make it concise. Once you execute the script, you should get a data-ridden csv file containing the required fields accordingly.

import csv
import requests
from bs4 import BeautifulSoup

base = 'https://www.supremecommunity.com{}'
link = 'https://www.supremecommunity.com/season/fall-winter2011/overview/'

r = requests.get(link,headers={"User-Agent":"Mozilla/5.0"})
soup = BeautifulSoup(r.text,"lxml")

with open("supremecommunity.csv","w",newline="") as f:
    writer = csv.writer(f)
    writer.writerow(['item_name','item_image','upvote','downvote'])

    for card in soup.select('[class$="d-card"]'):
        item_name = card.select_one('.card__top')['data-itemname']
        item_image = base.format(card.select_one('img.prefill-img').get('data-src'))
        upvote = card.select_one('.progress-bar-success > span').get_text(strip=True)
        downvote = card.select_one('.progress-bar-danger > span').get_text(strip=True)
        writer.writerow([item_name,item_image,upvote,downvote])
        print(item_name,item_image,upvote,downvote)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM