简体   繁体   中英

Join base url to relative url in scrapy

Im new to python and scrapy. Im having trouble joining the base url to the srapped link. Iv tried a number of suggestions but probably executing it incorrectly

def parse(self, response):
    for ad_links in response.xpath('//div[@class="view"][1]//a'):
        yield {
            'title': item.xpath('text()').extract(),
            relative_url = item.xpath('@href').extract(),
            'link': response.urljoin(relative_url),
            }

Any suggestions would be really appreciated Thanks

You cannot instanciate a variable inside the dictionary you are yielding, it makes no sense.

And be sure to understand the difference between extract() and extract_first(), I have the feeling that extract_first is the method to use here. See documentation .

What is this item variable ? Should be ad_links right ?

Try this :

def parse(self, response):
    for ad_links in response.xpath('//div[@class="view"][1]//a'):
        relative_url = ad_links.xpath('@href').extract_first()
        yield {
            'title': ad_links.xpath('text()').extract_first(),
            'link': response.urljoin(relative_url),
            }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM