简体   繁体   中英

Scrapy/Python/XPath - How to extract data from within data?

I'm new to Scrapy, and I've just started looking into XPath.

I'm trying to extract titles and links from html list items within a div. The following code is how I thought I'd go about doing it, (selecting the ul div, by id, then looping through the list items):

def parse(self, response):
    for t in response.xpath('//*[@id="categories"]/ul'):
        for x in t.xpath('//li'):
            item = TgmItem()
            item['title'] = x.xpath('a/text()').extract()
            item['link'] = x.xpath('a/@href').extract()
            yield item

But I received the same results as this attempt:

def parse(self, response):
    for x in response.xpath('//li'):
        item = TgmItem()
        item['title'] = x.xpath('a/text()').extract()
        item['link'] = x.xpath('a/@href').extract()
        yield item

Where the exported csv file contains li data from source code top to bottom...

I'm not an expert and I've made a number of attempts, if anyone could shed some light on this it would be appreciated.

You need to start your xpath expression used inside the inner loop with a dot:

for t in response.xpath('//*[@id="categories"]/ul'):
    for x in t.xpath('.//li'):

This would make it search in the scope of current element, not the whole page.

See more explanation at Working with relative XPaths .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM