簡體   English   中英

Scrapy:response.xpath 打印無,但點擊進入網絡鏈接時,xPath 是正確的

[英]Scrapy: response.xpath prints None, but upon clicking into weblink, xPath is correct

我正在嘗試打印出我要抓取的項目的 h1 標題。 我嘗試打印print(response.xpath('/html/body/div[2]/div/div[5]/div[2]/div[2]/div/h1').get())結果print(response.xpath('/html/body/div[2]/div/div[5]/div[2]/div[2]/div/h1').get())從像這樣的產品https://www.steinersports.com/football/tampa-bay-buccaneers/tom-brady-tampa-bay-buccaneers-super-bowl-lv-champions-autographed-white-nike-game- jersey-with-lv-mvp-inscription/o-8094+t-92602789+p-2679909745+z-8-2492872768?_ref=p-FALP:m-GRID:i-r20c0:po-60

我不知道如何 go 來調試這個錯誤,因為當我點擊沒有返回的鏈接並檢查 xpath 時,它是正確的。 任何幫助表示贊賞,下面的完整代碼:

import scrapy
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule
from scrapy.http import Request


class SteinerSportsCrawlSpiderSpider(CrawlSpider):
    name = 'steinersports_crawl_spider'
    allowed_domains = ['steinersports.com']
    start_urls = [
        'https://www.steinersports.com/football/signed/o-1383+fa-56+z-95296299-3058648695?_ref=m-TOPNAV',
        ]
    base_url = 'https://www.steinersports.com/football/signed/o-1383+fa-56+z-95296299-3058648695?_ref=m-TOPNAV'



    rules = (

        
        Rule(LinkExtractor(allow=r'/signed'), follow=True), 
        Rule(LinkExtractor(allow=r'football/', deny=r'/signed'), callback='parse_item', follow=True),
        
    )

    def parse_item(self, response):
        item = {}
        description_flag = True
        price_flag = True
        item_description = response.xpath('/html/body/div[2]/div/div[5]/div[2]/div[17]/div/div[2]/div').get()
        print(item)
        #item_price = response.xpath('//span[@class="product__price"]/text()').get()
        
        print(response.xpath('html/body/div[2]/div/div[5]/div[2]/div[2]/div/h1').get())
        item['item_name'] = response.xpath('html/body/div[2]/div/div[5]/div[2]/div[2]/div/h1').get()
        
        
        return item

您可以使用data-talos屬性直接訪問h1標簽。 這個 xpath 應該得到標題:

response.xpath("//h1[@data-talos='labelPdpProductTitle']/text()").extract_first()

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM