How to extract all href content from a page using scrapy

Question

I am trying to crawl this page .

I want to get all links from a given website using Scrapy

I am trying to this way -

import scrapy
import unidecode
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
from lxml import html


class ElementSpider(scrapy.Spider):
    name = 'linkdata'

    start_urls = ["https://www.goodreads.com/list/show/19793.I_Marked_My_Calendar_For_This_Book_s_Release",]


    def parse(self, response):

        links = response.xpath('//div[@id="all_votes"]/table[@class="tableList js-dataTooltip"]/div[@class="js-tooltipTrigger tooltipTrigger"]/a/@href').extract()
        print links

But i am getting nothing in output.

Answer 1

I think your xpath is worng. Try this-

for href in response.xpath('//div[@id="all_votes"]/table[@class="tableList js-dataTooltip"]/tr/td[2]/div[@class="js-tooltipTrigger tooltipTrigger"]/a/@href'):       
            full_url = response.urljoin(href.extract())
            print full_url

Hope it helps :)

Good luck...

How to extract all href content from a page using scrapy

Question

1 answers

solution1
3 ACCPTED 2016-10-07 12:15:14

How to extract all href content from a page using scrapy

Question

1 answers

solution1 3 ACCPTED 2016-10-07 12:15:14

solution1
3 ACCPTED 2016-10-07 12:15:14