Join base url to relative url in scrapy

Question

Im new to python and scrapy. Im having trouble joining the base url to the srapped link. Iv tried a number of suggestions but probably executing it incorrectly

def parse(self, response):
    for ad_links in response.xpath('//div[@class="view"][1]//a'):
        yield {
            'title': item.xpath('text()').extract(),
            relative_url = item.xpath('@href').extract(),
            'link': response.urljoin(relative_url),
            }

Any suggestions would be really appreciated Thanks

Answer 1

You cannot instanciate a variable inside the dictionary you are yielding, it makes no sense.

And be sure to understand the difference between extract() and extract_first(), I have the feeling that extract_first is the method to use here. See documentation .

What is this item variable ? Should be ad_links right ?

Try this :

def parse(self, response):
    for ad_links in response.xpath('//div[@class="view"][1]//a'):
        relative_url = ad_links.xpath('@href').extract_first()
        yield {
            'title': ad_links.xpath('text()').extract_first(),
            'link': response.urljoin(relative_url),
            }

Join base url to relative url in scrapy

Question

1 answers

solution1
1 ACCPTED 2018-10-04 14:50:19

Join base url to relative url in scrapy

Question

1 answers

solution1 1 ACCPTED 2018-10-04 14:50:19

solution1
1 ACCPTED 2018-10-04 14:50:19