[英]Scrapy doesn't recognise xpath
I try to get data from this page https://octopart.com/electronic-parts/integrated-circuits-ics but from the Specs button. 我尝试从此页面https://octopart.com/electronic-parts/integrated-circuits-ics获取数据,但要从“规格”按钮获取数据。 I try to get the names of the products with this code, but it doesn't work. 我尝试使用此代码获取产品名称,但是它不起作用。
class SpecSpider(scrapy.Spider):
name='specName'
start_urls = ['https://octopart.com/electronic-parts/integrated-circuits-ics']
custom_settings = {
'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFilter',
}
def parse(self,response):
return FormRequest.from_response(response, formxpath="//form[@class='btn-group']", clickdata={"value":"serp-grid"}, callback = self.scrape_pages)
def scrape_pages(self, response):
#open_in_browser(response)
items = SpecItem()
for product in response.xpath("//div[class='inner-body']/div[class='serp-wrap-all']/table[class='table-valign-middle matrix-table']"):
name = product.xpath(".//tr/td[class='matrix-col-part']/a[class='nowrap']/text()").extract()
items['ProductName']=''.join(name).strip()
price = product.xpath("//tr/td['4']/div[class='small']/text()").extract()
items['Price'] = ''.join(price).strip()
yield items
This xpath response.xpath("//div[class='inner-body']/div[class='serp-wrap-all']/table[class='table-valign-middle matrix-table']")
doesn't work. 这个xpath response.xpath("//div[class='inner-body']/div[class='serp-wrap-all']/table[class='table-valign-middle matrix-table']")
不起作用。
Any suggestions 有什么建议么
You are using wrong XPATH syntax! 您使用的XPATH语法错误!
//div[class='inner-body']/div[class='serp-wrap-all']/table[class='table-valign-middle matrix-table'] // div [class ='inner-body'] / div [class ='serp-wrap-all'] / table [class ='table-valign-middle matrix-table']
The correct format is to add "@" before "class" 正确的格式是在“类”之前添加“ @”
//div[@class='inner-body']/div[@class='serp-wrap-all']/.. // DIV [@类= '内体'] /格[@类= 'SERP穿孔卷绕所有'] / ..
And there is no 'matrix-table' table in above link. 上面的链接中没有“矩阵表”表。
Try using something like: 尝试使用类似:
//div[@class='inner-body']/div[@class='serp-wrap-all']//*[contains(@class,'matrix-table')] // DIV [@类= '内体'] /格[@类= 'SERP穿孔卷绕所有'] // * [含有(@类, '矩阵表')]
If you want just the top level product name use css selector of 如果只需要顶级产品名称,请使用CSS选择器
.serp-card-pdp-link
and extract the text 并提取文字
The median price comes from css selector 中位数价格来自CSS选择器
.avg-price-faux-btn
You can apply css with scrapy using .css(selector)
您可以使用.css(selector)
将CSS应用于scrapy
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.