简体   繁体   中英

How to extract all href content from a page using scrapy

I am trying to crawl this page .

I want to get all links from a given website using Scrapy

I am trying to this way -

import scrapy
import unidecode
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
from lxml import html


class ElementSpider(scrapy.Spider):
    name = 'linkdata'

    start_urls = ["https://www.goodreads.com/list/show/19793.I_Marked_My_Calendar_For_This_Book_s_Release",]


    def parse(self, response):

        links = response.xpath('//div[@id="all_votes"]/table[@class="tableList js-dataTooltip"]/div[@class="js-tooltipTrigger tooltipTrigger"]/a/@href').extract()
        print links

But i am getting nothing in output.

I think your xpath is worng. Try this-

for href in response.xpath('//div[@id="all_votes"]/table[@class="tableList js-dataTooltip"]/tr/td[2]/div[@class="js-tooltipTrigger tooltipTrigger"]/a/@href'):       
            full_url = response.urljoin(href.extract())
            print full_url

Hope it helps :)

Good luck...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM