簡體   English   中英

Scrapy程序未抓取所有數據

[英]Scrapy program is not scraping all data

我正在編寫一個用於抓取以下程序的程序,以抓取下一頁https://www.trollandtoad.com/magic-the-gathering/aether-revolt/10066 ,它只是抓取第一行數據,而不抓取其余數據。 我認為這與我的for循環有關,但是當我將循環更改為更寬時,它會輸出過多的數據,因為它會多次輸出每一行數據。

 def parse(self, response):
        item = GameItem()
        saved_name = ""
        for game in response.css("div.row.mt-1.list-view"):
            saved_name  = game.css("a.card-text::text").get() or saved_name
            item["Card_Name"] = saved_name.strip()
            if item["Card_Name"] != None:
                saved_name = item["Card_Name"].strip()
            else:
                item["Card_Name"] = saved_name
            yield item

更新#1



    def parse(self, response):
        for game in response.css('div.card > div.row'):
            item = GameItem()
            item["Card_Name"]  = game.css("a.card-text::text").get()
            for buying_option in game.css('div.buying-options-table div.row:not(:first-child)'):
                item["Condition"] = game.css("div.col-3.text-center.p-1::text").get()
                item["Price"] = game.css("div.col-2.text-center.p-1::text").get()
            yield item

樣本輸出

我認為您需要CSS以下的內容(以后可以將其用作處理buying-options容器的基礎):

 def parse(self, response):
        for game in response.css('div.card > div.row'):
            item = GameItem()
            Card_Name  = game.css("a.card-text::text").get()
            item["Card_Name"] = Card_Name.strip()
            for buying_option in game.css('div.buying-options-table div.row:not(:first-child)'):
                # process buying-option
                # may be you need to move GameItem() initialization inside this loop

            yield item

如您所見,我在循環內移動了item = GameItem() 另外,在這里也無需使用saved_game

response.css("div.row.mt-1.list-view")僅返回1個選擇器,因此循環中的代碼僅運行一次。 嘗試以下操作: for game in response.css(".mt-1.list-view .card-text"):您將獲得要循環顯示的選擇器列表。

您就是代碼-它不起作用,因為您是在列表循環之外創建GameItem()。 我一定錯過了有關此.get()和.getall()方法的明信片。 也許有人可以評論它與摘錄有何不同?

您失敗的代碼

 def parse(self, response):
        item = GameItem() # this line right here only creates 1 game item per page
        saved_name = ""
        for game in response.css("div.row.mt-1.list-view"): # this line fails since it gets all the items on the page. This is a wrapper wrapping all the items inside of it. See below code for corrected selector.
            saved_name  = game.css("a.card-text::text").get() or saved_name
            item["Card_Name"] = saved_name.strip()
            if item["Card_Name"] != None:
                saved_name = item["Card_Name"].strip()
            else:
                item["Card_Name"] = saved_name
            yield item

固定代碼來解決您的問題:

 def parse(self, response):
        for game in response.css("div.product-col"):
            item = GameItem()
            item["Card_Name"] = game.css("a.card-text::text").get()
            if not item["Card_Name"]:
                continue # this will skip to the next item if there is no card name, if there is a card name it will continue to yield the item. Another way of doing this would be to return nothing. Just "return". You only do this if you DO NOT want code after executed. If you want the code after to execute then use yeid.
            yield item

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM