簡體   English   中英

我的第一個 python web 刮刀出現問題

[英]Having issues with my first python web scraper

我正在嘗試在一個名為至尊社區的網站上為個人項目抓取信息。 他們有來自不同季節的物品檔案,我正試圖將這些信息放入 csv 中。 我的代碼包括一個循環,我從這個 github 頁面https://github.com/CharlieAIO/Supreme-Community-Scraper/blob/master/sup.py

下面是我正在使用的代碼,它將正常運行,但 csv 仍然為空,除了我設置的標題。 我在這里做錯了嗎? 網站是否拒絕了我的請求? 任何幫助或方向表示贊賞。

import requests
from bs4 import BeautifulSoup as bs

urls = ['https://www.supremecommunity.com/season/fall-winter2011/overview/']
open("SupremeData.csv","w")

filename = "SupremeData.csv"

headers = "Item,Image,Price,Upvotes,Downvotes"

f = open(filename, "w")
f.write(headers)


for link in urls:

    r = requests.get(link)
    soup = bs(r.text,"html.parser")
    cards = soup.find_all('div',{'class':'card card-2'})

    for card in cards:
        item = card.find("div",{"class":"card-details"})["data-itemname"]
        img = card.find("img",{"class":"prefill-img"})["src"]
        image = f'https://supremecommunity.com{img}'
        price = card.find("span",{"class":"label-price"}).text
        price = price.replace(" ","")
        price = price.replace("\n","")
        upvotes = card.find("p",{"class":"upvotes hidden"}).text
        downvotes = card.find("p",{"class":"downvotes hidden"}).text

        f.write(item + "," + image + "," + price + "," + upvotes + "," + downvotes + "\n")


f.close()    

此代碼保存 csv 文件以及產品的一些詳細信息。

import requests
from bs4 import BeautifulSoup as bs

urls = ['https://www.supremecommunity.com/season/fall-winter2011/overview/']
open("SupremeData.csv","w")

filename = "SupremeData.csv"

headers = "Item,Image,Price,Upvotes,Downvotes"

f = open(filename, "w")
f.write(headers)


for link in urls:

    r = requests.get(link)
    soup = bs(r.content,"html.parser")
    #print(soup)
    cards = soup.find_all('div',{'class':'card-2'})
    #print(cards)
    #print(len(cards))
    for card in cards:
        item = card.find("div",{"class":"card__top"})["data-itemname"]
        img = card.find("img",{"class":"prefill-img"})["src"]
        image = f'https://supremecommunity.com{img}'
        try :
            price = card.find("span",{"class":"label-price"}).text
            price = price.replace(" ","")
            price = price.replace("\n","")
        except :
            price = 'Not Available'
        try :
            upvotes = card.find("p",{"class":"upvotes hidden"}).text
            downvotes = card.find("p",{"class":"downvotes hidden"}).text
        except:
            upvotes = 'Not Found'
            downvotes = 'Not Found'
        print((item + "," + image + "," + price + "," + upvotes + "," + downvotes + "\n"))
        f.write(item + "," + image + "," + price + "," + upvotes + "," + downvotes + "\n")


f.close()   

這是您可以從該網頁獲取除價格之外的所有上述字段的方法之一,因為它們在該網頁中尚不可用。 要檢查它,您可以單擊每個圖像並切換價格選項卡。 我使用.select()而不是.find_all()來使其簡潔。 執行腳本后,您應該會獲得一個包含相應字段的包含數據的 csv 文件。

import csv
import requests
from bs4 import BeautifulSoup

base = 'https://www.supremecommunity.com{}'
link = 'https://www.supremecommunity.com/season/fall-winter2011/overview/'

r = requests.get(link,headers={"User-Agent":"Mozilla/5.0"})
soup = BeautifulSoup(r.text,"lxml")

with open("supremecommunity.csv","w",newline="") as f:
    writer = csv.writer(f)
    writer.writerow(['item_name','item_image','upvote','downvote'])

    for card in soup.select('[class$="d-card"]'):
        item_name = card.select_one('.card__top')['data-itemname']
        item_image = base.format(card.select_one('img.prefill-img').get('data-src'))
        upvote = card.select_one('.progress-bar-success > span').get_text(strip=True)
        downvote = card.select_one('.progress-bar-danger > span').get_text(strip=True)
        writer.writerow([item_name,item_image,upvote,downvote])
        print(item_name,item_image,upvote,downvote)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM