简体   繁体   中英

Scrapy - response.xpath get items back seperated

I am trying to scrape a Webpage which has multiple Blog Entries on the first page.
This is my code so far:

for rel in response.xpath('//*[@id="content"]/div[*]/div/comment()[2]'):
    item = Example()
    item['title'] = rel.xpath('//*[@id="content"]/div[*]/div/div/input/@value').extract()
    item['link'] = rel.xpath('//*[@id="content"]/div[*]/div/div/span[4]/a/@href').extract()
    yield item

Problem is if I go with the "*" I get a link and a title back with all entries in it.
But I would like to have a title and a link for every single entry.
I am very new to Python and scrapy and don't know how to count up to get the single entries back.
The first entry starts with "2" and the next is +3 till it end at 29.(2,5,8....29)

Let me suggest more explicit XPaths. Something like should be closer to your goal:

for rel in response.xpath('//div[@class="beschreibung"]'):
    item['title'] = rel.xpath(".//strong[contains(text(),"Release")]/following-sibling::*[1]/@value").extract()
    item['link'] = rel.xpath('.//span[@style="display:inline;"]//a[contains(text(),"Share")]/@href').extract()
    yield item

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM