简体   繁体   中英

scrapy css selector returning None then finds value

So basically I am adding this portion to my code and I have no clue whats going on. This is the link i am using https://www.digikey.com/products/en?keywords=ID82C55 All in the same Process: -So my css selector returns none. -Then it finds a couple of the html elements returns some of them. -Then finds the last element.

So this is causing my program to mix match data and yields it incorrectly to my csv file. If anyone could tell me what the problem is here? Thanks.

Code

def parse(self, response):

            
            for b in response.css('div#pdp_content.product-details > div'):

                if b.css('div.product-details-headline h1::text').get():
                    part = b.css('div.product-details-headline h1::text').get()
                    part = part.strip()
                    parts1 = part
                    print(b.css('div.product-details-headline h1::text').get())
                    print(parts1)

                else:
                    print(b.css('div.product-details-headline h1::text').get())

                if b.css('table.product-dollars > tr:nth-last-child(1) td:nth-last-child(3)::text').get():
                    cleaned_quantity = b.css('table.product-dollars > tr:nth-last-child(1) td:nth-last-child(3)::text').get()
                    print(cleaned_quantity)
                else:
                    print(b.css('table.product-dollars > tr:nth-last-child(1) td:nth-last-child(3)::text').get())
                if b.css('table.product-dollars > tr:nth-last-child(1) td:nth-last-child(2)::text').get():
                    cleaned_price = b.css('table.product-dollars > tr:nth-last-child(1) td:nth-last-child(2)::text').get()
                    print(cleaned_price)

                else:
                    print(b.css('table.product-dollars > tr:nth-last-child(1) td:nth-last-child(2)::text').get())
                if b.css('div.quantity-message span#dkQty::text').get():
                    cleaned_stock = b.css('div.quantity-message span#dkQty::text').get()
                    print(cleaned_stock)

                else:
                    print(b.css('div.quantity-message span#dkQty::text').get())

                if b.css('table#product-attribute-table > tr:nth-child(7) td::text').get():
                    status = b.css('table#product-attribute-table > tr:nth-child(7) td::text').get()
                    status = status.strip()
                    cleaned_status = status
                    print(cleaned_status)

                else:
                    print(b.css('table#product-attribute-table > tr:nth-child(7) td::text').get())

                # yield {
                #     'Part': parts1,
                #     'Quantity': cleaned_quantity,
                #     'Price': cleaned_price,
                #     'Stock': cleaned_stock,
                #     'Status': cleaned_status,
                # }

Output

None
None
None
None
None
None
2,500
29.10828
29
None

                                ID82C55A
                            
ID82C55A
None
None
None
Active

I highly recommend you to switch to XPath expressions:

part_number = b.xpath('.//th[.="Manufacturer Part Number"]/following-sibling::td[1]/text()').get()
stock = b.xpath('.//span[.="In Stock"]/preceding-sibling::span[1]/text()').get()
etc.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM