I am trying to crawl this page .
I want to get all links from a given website using Scrapy
I am trying to this way -
import scrapy
import unidecode
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor
from lxml import html
class ElementSpider(scrapy.Spider):
name = 'linkdata'
start_urls = ["https://www.goodreads.com/list/show/19793.I_Marked_My_Calendar_For_This_Book_s_Release",]
def parse(self, response):
links = response.xpath('//div[@id="all_votes"]/table[@class="tableList js-dataTooltip"]/div[@class="js-tooltipTrigger tooltipTrigger"]/a/@href').extract()
print links
But i am getting nothing in output.
I think your xpath is worng. Try this-
for href in response.xpath('//div[@id="all_votes"]/table[@class="tableList js-dataTooltip"]/tr/td[2]/div[@class="js-tooltipTrigger tooltipTrigger"]/a/@href'):
full_url = response.urljoin(href.extract())
print full_url
Hope it helps :)
Good luck...
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.