简体   繁体   中英

Scrapy - Retrieve Authentication Token from javascript script response

I need help about this specific scenario.

Scenario

  1. Calling site

http://www.example.com/index.php

I can get this information from <script> tag

https://www.example.com/anotherpage.php?key=ABCDFG

using the key, I have to call this endpoint

https://www.example.com/login.php?key=ABCD

for retrieving the SessionID which is stored inside the javascript response

-- omitted

private._sessID='MYSESSIONID';

-- omitted

At the end, using this sessionId and performing right POST actions, I can navigate inside all pages I need.

My stalemate

I'm able to simulate all steps using scrapy shell with regEx (and all work fine), but I don't know how to manage these steps inside a scrapy spider before starting data extraction.

Can someone help me out?

You need to start with base URL http://www.example.com/index.php by calling it in start request method and write its callback and extract information from other endpoint and take that result into other callback and then you can start scraping process.

You need to implement in the following way

class CrawlSpider(scrapy.CrawlSpider):

   def parse_authentication_token(self, response):
      //extract token or whatever require and then call supers parse
      yield from super().parse()

   def start_request(self):
       return Request(url, callback=self.parse_authentication_token)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM