简体   繁体   English

scrapy css 选择器返回 None 然后找到值

[英]scrapy css selector returning None then finds value

So basically I am adding this portion to my code and I have no clue whats going on.所以基本上我将这部分添加到我的代码中,我不知道发生了什么。 This is the link i am using https://www.digikey.com/products/en?keywords=ID82C55 All in the same Process: -So my css selector returns none.这是我使用https://www.digikey.com/products/en?keywords=ID82C55的链接都在同一个过程中: - 所以我的 css 选择器没有返回。 -Then it finds a couple of the html elements returns some of them. - 然后它发现几个 html 元素返回其中一些。 -Then finds the last element. - 然后找到最后一个元素。

So this is causing my program to mix match data and yields it incorrectly to my csv file.所以这导致我的程序混合匹配数据并将其错误地生成到我的 csv 文件中。 If anyone could tell me what the problem is here?如果有人能告诉我这里有什么问题吗? Thanks.谢谢。

Code代码

def parse(self, response):

            
            for b in response.css('div#pdp_content.product-details > div'):

                if b.css('div.product-details-headline h1::text').get():
                    part = b.css('div.product-details-headline h1::text').get()
                    part = part.strip()
                    parts1 = part
                    print(b.css('div.product-details-headline h1::text').get())
                    print(parts1)

                else:
                    print(b.css('div.product-details-headline h1::text').get())

                if b.css('table.product-dollars > tr:nth-last-child(1) td:nth-last-child(3)::text').get():
                    cleaned_quantity = b.css('table.product-dollars > tr:nth-last-child(1) td:nth-last-child(3)::text').get()
                    print(cleaned_quantity)
                else:
                    print(b.css('table.product-dollars > tr:nth-last-child(1) td:nth-last-child(3)::text').get())
                if b.css('table.product-dollars > tr:nth-last-child(1) td:nth-last-child(2)::text').get():
                    cleaned_price = b.css('table.product-dollars > tr:nth-last-child(1) td:nth-last-child(2)::text').get()
                    print(cleaned_price)

                else:
                    print(b.css('table.product-dollars > tr:nth-last-child(1) td:nth-last-child(2)::text').get())
                if b.css('div.quantity-message span#dkQty::text').get():
                    cleaned_stock = b.css('div.quantity-message span#dkQty::text').get()
                    print(cleaned_stock)

                else:
                    print(b.css('div.quantity-message span#dkQty::text').get())

                if b.css('table#product-attribute-table > tr:nth-child(7) td::text').get():
                    status = b.css('table#product-attribute-table > tr:nth-child(7) td::text').get()
                    status = status.strip()
                    cleaned_status = status
                    print(cleaned_status)

                else:
                    print(b.css('table#product-attribute-table > tr:nth-child(7) td::text').get())

                # yield {
                #     'Part': parts1,
                #     'Quantity': cleaned_quantity,
                #     'Price': cleaned_price,
                #     'Stock': cleaned_stock,
                #     'Status': cleaned_status,
                # }

Output Output

None
None
None
None
None
None
2,500
29.10828
29
None

                                ID82C55A
                            
ID82C55A
None
None
None
Active

I highly recommend you to switch to XPath expressions:我强烈建议您切换到 XPath 表达式:

part_number = b.xpath('.//th[.="Manufacturer Part Number"]/following-sibling::td[1]/text()').get()
stock = b.xpath('.//span[.="In Stock"]/preceding-sibling::span[1]/text()').get()
etc.

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM