scrapy css 選擇器返回 None 然后找到值

Question

所以基本上我將這部分添加到我的代碼中，我不知道發生了什么。 這是我使用https://www.digikey.com/products/en?keywords=ID82C55的鏈接都在同一個過程中： - 所以我的 css 選擇器沒有返回。 - 然后它發現幾個 html 元素返回其中一些。 - 然后找到最后一個元素。

所以這導致我的程序混合匹配數據並將其錯誤地生成到我的 csv 文件中。 如果有人能告訴我這里有什么問題嗎？ 謝謝。

代碼

def parse(self, response):

            
            for b in response.css('div#pdp_content.product-details > div'):

                if b.css('div.product-details-headline h1::text').get():
                    part = b.css('div.product-details-headline h1::text').get()
                    part = part.strip()
                    parts1 = part
                    print(b.css('div.product-details-headline h1::text').get())
                    print(parts1)

                else:
                    print(b.css('div.product-details-headline h1::text').get())

                if b.css('table.product-dollars > tr:nth-last-child(1) td:nth-last-child(3)::text').get():
                    cleaned_quantity = b.css('table.product-dollars > tr:nth-last-child(1) td:nth-last-child(3)::text').get()
                    print(cleaned_quantity)
                else:
                    print(b.css('table.product-dollars > tr:nth-last-child(1) td:nth-last-child(3)::text').get())
                if b.css('table.product-dollars > tr:nth-last-child(1) td:nth-last-child(2)::text').get():
                    cleaned_price = b.css('table.product-dollars > tr:nth-last-child(1) td:nth-last-child(2)::text').get()
                    print(cleaned_price)

                else:
                    print(b.css('table.product-dollars > tr:nth-last-child(1) td:nth-last-child(2)::text').get())
                if b.css('div.quantity-message span#dkQty::text').get():
                    cleaned_stock = b.css('div.quantity-message span#dkQty::text').get()
                    print(cleaned_stock)

                else:
                    print(b.css('div.quantity-message span#dkQty::text').get())

                if b.css('table#product-attribute-table > tr:nth-child(7) td::text').get():
                    status = b.css('table#product-attribute-table > tr:nth-child(7) td::text').get()
                    status = status.strip()
                    cleaned_status = status
                    print(cleaned_status)

                else:
                    print(b.css('table#product-attribute-table > tr:nth-child(7) td::text').get())

                # yield {
                #     'Part': parts1,
                #     'Quantity': cleaned_quantity,
                #     'Price': cleaned_price,
                #     'Stock': cleaned_stock,
                #     'Status': cleaned_status,
                # }

Output

None
None
None
None
None
None
2,500
29.10828
29
None

                                ID82C55A
                            
ID82C55A
None
None
None
Active

Answer 1

我強烈建議您切換到 XPath 表達式：

part_number = b.xpath('.//th[.="Manufacturer Part Number"]/following-sibling::td[1]/text()').get()
stock = b.xpath('.//span[.="In Stock"]/preceding-sibling::span[1]/text()').get()
etc.

scrapy css 選擇器返回 None 然后找到值

問題描述

1 個解決方案

解決方案1
3 已采納 2020-08-14 13:44:43

scrapy css 選擇器返回 None 然后找到值

問題描述

1 個解決方案

解決方案1 3 已采納 2020-08-14 13:44:43

解決方案1
3 已采納 2020-08-14 13:44:43