繁体   English   中英

Scrapy程序未抓取所有数据

[英]Scrapy program is not scraping all data

我正在编写一个用于抓取以下程序的程序,以抓取下一页https://www.trollandtoad.com/magic-the-gathering/aether-revolt/10066 ,它只是抓取第一行数据,而不抓取其余数据。 我认为这与我的for循环有关,但是当我将循环更改为更宽时,它会输出过多的数据,因为它会多次输出每一行数据。

 def parse(self, response):
        item = GameItem()
        saved_name = ""
        for game in response.css("div.row.mt-1.list-view"):
            saved_name  = game.css("a.card-text::text").get() or saved_name
            item["Card_Name"] = saved_name.strip()
            if item["Card_Name"] != None:
                saved_name = item["Card_Name"].strip()
            else:
                item["Card_Name"] = saved_name
            yield item

更新#1



    def parse(self, response):
        for game in response.css('div.card > div.row'):
            item = GameItem()
            item["Card_Name"]  = game.css("a.card-text::text").get()
            for buying_option in game.css('div.buying-options-table div.row:not(:first-child)'):
                item["Condition"] = game.css("div.col-3.text-center.p-1::text").get()
                item["Price"] = game.css("div.col-2.text-center.p-1::text").get()
            yield item

样本输出

我认为您需要CSS以下的内容(以后可以将其用作处理buying-options容器的基础):

 def parse(self, response):
        for game in response.css('div.card > div.row'):
            item = GameItem()
            Card_Name  = game.css("a.card-text::text").get()
            item["Card_Name"] = Card_Name.strip()
            for buying_option in game.css('div.buying-options-table div.row:not(:first-child)'):
                # process buying-option
                # may be you need to move GameItem() initialization inside this loop

            yield item

如您所见,我在循环内移动了item = GameItem() 另外,在这里也无需使用saved_game

response.css("div.row.mt-1.list-view")仅返回1个选择器,因此循环中的代码仅运行一次。 尝试以下操作: for game in response.css(".mt-1.list-view .card-text"):您将获得要循环显示的选择器列表。

您就是代码-它不起作用,因为您是在列表循环之外创建GameItem()。 我一定错过了有关此.get()和.getall()方法的明信片。 也许有人可以评论它与摘录有何不同?

您失败的代码

 def parse(self, response):
        item = GameItem() # this line right here only creates 1 game item per page
        saved_name = ""
        for game in response.css("div.row.mt-1.list-view"): # this line fails since it gets all the items on the page. This is a wrapper wrapping all the items inside of it. See below code for corrected selector.
            saved_name  = game.css("a.card-text::text").get() or saved_name
            item["Card_Name"] = saved_name.strip()
            if item["Card_Name"] != None:
                saved_name = item["Card_Name"].strip()
            else:
                item["Card_Name"] = saved_name
            yield item

固定代码来解决您的问题:

 def parse(self, response):
        for game in response.css("div.product-col"):
            item = GameItem()
            item["Card_Name"] = game.css("a.card-text::text").get()
            if not item["Card_Name"]:
                continue # this will skip to the next item if there is no card name, if there is a card name it will continue to yield the item. Another way of doing this would be to return nothing. Just "return". You only do this if you DO NOT want code after executed. If you want the code after to execute then use yeid.
            yield item

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM