简体   繁体   English

Scrapy-response.xpath将项目分开

[英]Scrapy - response.xpath get items back seperated

I am trying to scrape a Webpage which has multiple Blog Entries on the first page. 我正在尝试抓取第一页上有多个博客条目的网页。
This is my code so far: 到目前为止,这是我的代码:

for rel in response.xpath('//*[@id="content"]/div[*]/div/comment()[2]'):
    item = Example()
    item['title'] = rel.xpath('//*[@id="content"]/div[*]/div/div/input/@value').extract()
    item['link'] = rel.xpath('//*[@id="content"]/div[*]/div/div/span[4]/a/@href').extract()
    yield item

Problem is if I go with the "*" I get a link and a title back with all entries in it. 问题是,如果我使用"*"则会得到一个链接和一个包含所有条目的标题。
But I would like to have a title and a link for every single entry. 但我想为每个条目都提供标题和链接。
I am very new to Python and scrapy and don't know how to count up to get the single entries back. 我是Python的scrapy ,而且scrapy ,不知道如何累加以获取单个条目。
The first entry starts with "2" and the next is +3 till it end at 29.(2,5,8....29) 第一个条目以"2"开头,下一个条目为+3直到以29.(2,5,8 .... 29)结尾。

Let me suggest more explicit XPaths. 让我建议更明确的XPath。 Something like should be closer to your goal: 诸如此类的东西应该更接近您的目标:

for rel in response.xpath('//div[@class="beschreibung"]'):
    item['title'] = rel.xpath(".//strong[contains(text(),"Release")]/following-sibling::*[1]/@value").extract()
    item['link'] = rel.xpath('.//span[@style="display:inline;"]//a[contains(text(),"Share")]/@href').extract()
    yield item

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM