Can I extract this XHR data with Scrapy?

Question

I am trying to extract data from this link with Scrapy. I am looking to loop through these urls with page=1 through like the top 100 pages and extract every instance of <a href=\\"/@eberhardgross\\">\\n for example. Ultimately just trying to grab the username there but there are other <a href=""> on the page but if I could extract just the username that would be great but if I have to get all <a href=""> that's fine I can sort them and get just the @. Just wondering if I can do this via scrapy?

import scrapy

class QuotesSpider(scrapy.Spider):
    name = "quotes"

def start_requests(self):
    url = "https://www.pexels.com/leaderboard/all-time.js?format=js&seed=&page=%(page_number)s&type="
    page_to_crawl = 100
    for page_number in range(page_to_crawl):
        yield scrapy.Request(url %{'page_number': page_number}, self.parse)

def parse(self, response):
    usernames = response.xpath('//a[contains(@href, "@")]/@href').getall()

Answer 1

To crawl several pages you can use start_requests to iterate on pages:

def start_requests(self):
    url = "https://www.pexels.com/leaderboard/all-time.js?format=js&seed=&page=%(page_number)s&type="
    page_to_crawl = 100
    for page_number in range(page_to_crawl):
        yield scrapy.Request(url %{'page_number': page_number}, self.parse)

And in your parse method you can get HREFs that contains @ in it by xpath:

def parse(self, response):
    usernames = response.xpath('//a[contains(@href, "@")]/@href').getall()
    yield {
         'usernames': usernames
    }

Can I extract this XHR data with Scrapy?

Question

1 answers

solution1
0 2019-12-18 15:35:41

Can I extract this XHR data with Scrapy?

Question

1 answers

solution1 0 2019-12-18 15:35:41

solution1
0 2019-12-18 15:35:41