简体   繁体   English

我的第一个 python web 刮刀出现问题

[英]Having issues with my first python web scraper

I am trying to scrape information on a website called supremecommunity for a personal project.我正在尝试在一个名为至尊社区的网站上为个人项目抓取信息。 They have an archive of items from different seasons and I am trying to put that information into a csv.他们有来自不同季节的物品档案,我正试图将这些信息放入 csv 中。 My code includes a loop that I got from this github page https://github.com/CharlieAIO/Supreme-Community-Scraper/blob/master/sup.py我的代码包括一个循环,我从这个 github 页面https://github.com/CharlieAIO/Supreme-Community-Scraper/blob/master/sup.py

below is the code that I am working with, it will run without error but the csv remains empty save for the headers that I set.下面是我正在使用的代码,它将正常运行,但 csv 仍然为空,除了我设置的标题。 Am I doing something wrong here?我在这里做错了吗? Is the website denying my request?网站是否拒绝了我的请求? Any help or direction appreciated.任何帮助或方向表示赞赏。

import requests
from bs4 import BeautifulSoup as bs

urls = ['https://www.supremecommunity.com/season/fall-winter2011/overview/']
open("SupremeData.csv","w")

filename = "SupremeData.csv"

headers = "Item,Image,Price,Upvotes,Downvotes"

f = open(filename, "w")
f.write(headers)


for link in urls:

    r = requests.get(link)
    soup = bs(r.text,"html.parser")
    cards = soup.find_all('div',{'class':'card card-2'})

    for card in cards:
        item = card.find("div",{"class":"card-details"})["data-itemname"]
        img = card.find("img",{"class":"prefill-img"})["src"]
        image = f'https://supremecommunity.com{img}'
        price = card.find("span",{"class":"label-price"}).text
        price = price.replace(" ","")
        price = price.replace("\n","")
        upvotes = card.find("p",{"class":"upvotes hidden"}).text
        downvotes = card.find("p",{"class":"downvotes hidden"}).text

        f.write(item + "," + image + "," + price + "," + upvotes + "," + downvotes + "\n")


f.close()    

this code save the csv file with some detail of product.此代码保存 csv 文件以及产品的一些详细信息。

import requests
from bs4 import BeautifulSoup as bs

urls = ['https://www.supremecommunity.com/season/fall-winter2011/overview/']
open("SupremeData.csv","w")

filename = "SupremeData.csv"

headers = "Item,Image,Price,Upvotes,Downvotes"

f = open(filename, "w")
f.write(headers)


for link in urls:

    r = requests.get(link)
    soup = bs(r.content,"html.parser")
    #print(soup)
    cards = soup.find_all('div',{'class':'card-2'})
    #print(cards)
    #print(len(cards))
    for card in cards:
        item = card.find("div",{"class":"card__top"})["data-itemname"]
        img = card.find("img",{"class":"prefill-img"})["src"]
        image = f'https://supremecommunity.com{img}'
        try :
            price = card.find("span",{"class":"label-price"}).text
            price = price.replace(" ","")
            price = price.replace("\n","")
        except :
            price = 'Not Available'
        try :
            upvotes = card.find("p",{"class":"upvotes hidden"}).text
            downvotes = card.find("p",{"class":"downvotes hidden"}).text
        except:
            upvotes = 'Not Found'
            downvotes = 'Not Found'
        print((item + "," + image + "," + price + "," + upvotes + "," + downvotes + "\n"))
        f.write(item + "," + image + "," + price + "," + upvotes + "," + downvotes + "\n")


f.close()   

This is one of the ways you can fetch all the aforementioned fields from that webpage other than the prices as they are not available in there yet.这是您可以从该网页获取除价格之外的所有上述字段的方法之一,因为它们在该网页中尚不可用。 To check it, you can click on each image and toggle the price tab.要检查它,您可以单击每个图像并切换价格选项卡。 I've used .select() instead of .find_all() to make it concise.我使用.select()而不是.find_all()来使其简洁。 Once you execute the script, you should get a data-ridden csv file containing the required fields accordingly.执行脚本后,您应该会获得一个包含相应字段的包含数据的 csv 文件。

import csv
import requests
from bs4 import BeautifulSoup

base = 'https://www.supremecommunity.com{}'
link = 'https://www.supremecommunity.com/season/fall-winter2011/overview/'

r = requests.get(link,headers={"User-Agent":"Mozilla/5.0"})
soup = BeautifulSoup(r.text,"lxml")

with open("supremecommunity.csv","w",newline="") as f:
    writer = csv.writer(f)
    writer.writerow(['item_name','item_image','upvote','downvote'])

    for card in soup.select('[class$="d-card"]'):
        item_name = card.select_one('.card__top')['data-itemname']
        item_image = base.format(card.select_one('img.prefill-img').get('data-src'))
        upvote = card.select_one('.progress-bar-success > span').get_text(strip=True)
        downvote = card.select_one('.progress-bar-danger > span').get_text(strip=True)
        writer.writerow([item_name,item_image,upvote,downvote])
        print(item_name,item_image,upvote,downvote)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM