[英]Scrapy: response.xpath prints None, but upon clicking into weblink, xPath is correct
我正在嘗試打印出我要抓取的項目的 h1 標題。 我嘗試打印print(response.xpath('/html/body/div[2]/div/div[5]/div[2]/div[2]/div/h1').get())
結果print(response.xpath('/html/body/div[2]/div/div[5]/div[2]/div[2]/div/h1').get())
從像這樣的產品https://www.steinersports.com/football/tampa-bay-buccaneers/tom-brady-tampa-bay-buccaneers-super-bowl-lv-champions-autographed-white-nike-game- jersey-with-lv-mvp-inscription/o-8094+t-92602789+p-2679909745+z-8-2492872768?_ref=p-FALP:m-GRID:i-r20c0:po-60 。
我不知道如何 go 來調試這個錯誤,因為當我點擊沒有返回的鏈接並檢查 xpath 時,它是正確的。 任何幫助表示贊賞,下面的完整代碼:
import scrapy
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule
from scrapy.http import Request
class SteinerSportsCrawlSpiderSpider(CrawlSpider):
name = 'steinersports_crawl_spider'
allowed_domains = ['steinersports.com']
start_urls = [
'https://www.steinersports.com/football/signed/o-1383+fa-56+z-95296299-3058648695?_ref=m-TOPNAV',
]
base_url = 'https://www.steinersports.com/football/signed/o-1383+fa-56+z-95296299-3058648695?_ref=m-TOPNAV'
rules = (
Rule(LinkExtractor(allow=r'/signed'), follow=True),
Rule(LinkExtractor(allow=r'football/', deny=r'/signed'), callback='parse_item', follow=True),
)
def parse_item(self, response):
item = {}
description_flag = True
price_flag = True
item_description = response.xpath('/html/body/div[2]/div/div[5]/div[2]/div[17]/div/div[2]/div').get()
print(item)
#item_price = response.xpath('//span[@class="product__price"]/text()').get()
print(response.xpath('html/body/div[2]/div/div[5]/div[2]/div[2]/div/h1').get())
item['item_name'] = response.xpath('html/body/div[2]/div/div[5]/div[2]/div[2]/div/h1').get()
return item
您可以使用data-talos
屬性直接訪問h1
標簽。 這個 xpath 應該得到標題:
response.xpath("//h1[@data-talos='labelPdpProductTitle']/text()").extract_first()
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.