Scrapy-response.xpath将项目分开

Question

I am trying to scrape a Webpage which has multiple Blog Entries on the first page. 我正在尝试抓取第一页上有多个博客条目的网页。
This is my code so far: 到目前为止，这是我的代码：

for rel in response.xpath('//*[@id="content"]/div[*]/div/comment()[2]'):
    item = Example()
    item['title'] = rel.xpath('//*[@id="content"]/div[*]/div/div/input/@value').extract()
    item['link'] = rel.xpath('//*[@id="content"]/div[*]/div/div/span[4]/a/@href').extract()
    yield item

Problem is if I go with the "*" I get a link and a title back with all entries in it. 问题是，如果我使用"*"则会得到一个链接和一个包含所有条目的标题。
But I would like to have a title and a link for every single entry. 但我想为每个条目都提供标题和链接。
I am very new to Python and scrapy and don't know how to count up to get the single entries back. 我是Python的scrapy ，而且scrapy ，不知道如何累加以获取单个条目。
The first entry starts with "2" and the next is +3 till it end at 29.(2,5,8....29) 第一个条目以"2"开头，下一个条目为+3直到以29.（2,5,8 .... 29）结尾。

Answer 1

Let me suggest more explicit XPaths. 让我建议更明确的XPath。 Something like should be closer to your goal: 诸如此类的东西应该更接近您的目标：

for rel in response.xpath('//div[@class="beschreibung"]'):
    item['title'] = rel.xpath(".//strong[contains(text(),"Release")]/following-sibling::*[1]/@value").extract()
    item['link'] = rel.xpath('.//span[@style="display:inline;"]//a[contains(text(),"Share")]/@href').extract()
    yield item

Scrapy-response.xpath将项目分开

问题描述

1 个解决方案

解决方案1
0 已采纳 2016-03-02 14:38:21

Scrapy-response.xpath将项目分开

问题描述

1 个解决方案

解决方案1 0 已采纳 2016-03-02 14:38:21

解决方案1
0 已采纳 2016-03-02 14:38:21