[英]Scrapy program is not scraping all data
我正在編寫一個用於抓取以下程序的程序,以抓取下一頁https://www.trollandtoad.com/magic-the-gathering/aether-revolt/10066 ,它只是抓取第一行數據,而不抓取其余數據。 我認為這與我的for循環有關,但是當我將循環更改為更寬時,它會輸出過多的數據,因為它會多次輸出每一行數據。
def parse(self, response):
item = GameItem()
saved_name = ""
for game in response.css("div.row.mt-1.list-view"):
saved_name = game.css("a.card-text::text").get() or saved_name
item["Card_Name"] = saved_name.strip()
if item["Card_Name"] != None:
saved_name = item["Card_Name"].strip()
else:
item["Card_Name"] = saved_name
yield item
更新#1
def parse(self, response):
for game in response.css('div.card > div.row'):
item = GameItem()
item["Card_Name"] = game.css("a.card-text::text").get()
for buying_option in game.css('div.buying-options-table div.row:not(:first-child)'):
item["Condition"] = game.css("div.col-3.text-center.p-1::text").get()
item["Price"] = game.css("div.col-2.text-center.p-1::text").get()
yield item
我認為您需要CSS以下的內容(以后可以將其用作處理buying-options
容器的基礎):
def parse(self, response):
for game in response.css('div.card > div.row'):
item = GameItem()
Card_Name = game.css("a.card-text::text").get()
item["Card_Name"] = Card_Name.strip()
for buying_option in game.css('div.buying-options-table div.row:not(:first-child)'):
# process buying-option
# may be you need to move GameItem() initialization inside this loop
yield item
如您所見,我在循環內移動了item = GameItem()
。 另外,在這里也無需使用saved_game
。
response.css("div.row.mt-1.list-view")
僅返回1個選擇器,因此循環中的代碼僅運行一次。 嘗試以下操作: for game in response.css(".mt-1.list-view .card-text"):
您將獲得要循環顯示的選擇器列表。
您就是代碼-它不起作用,因為您是在列表循環之外創建GameItem()。 我一定錯過了有關此.get()和.getall()方法的明信片。 也許有人可以評論它與摘錄有何不同?
您失敗的代碼
def parse(self, response):
item = GameItem() # this line right here only creates 1 game item per page
saved_name = ""
for game in response.css("div.row.mt-1.list-view"): # this line fails since it gets all the items on the page. This is a wrapper wrapping all the items inside of it. See below code for corrected selector.
saved_name = game.css("a.card-text::text").get() or saved_name
item["Card_Name"] = saved_name.strip()
if item["Card_Name"] != None:
saved_name = item["Card_Name"].strip()
else:
item["Card_Name"] = saved_name
yield item
固定代碼來解決您的問題:
def parse(self, response):
for game in response.css("div.product-col"):
item = GameItem()
item["Card_Name"] = game.css("a.card-text::text").get()
if not item["Card_Name"]:
continue # this will skip to the next item if there is no card name, if there is a card name it will continue to yield the item. Another way of doing this would be to return nothing. Just "return". You only do this if you DO NOT want code after executed. If you want the code after to execute then use yeid.
yield item
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.