简体   繁体   English

Scrapy - 从 javascript 脚本响应中检索身份验证令牌

[英]Scrapy - Retrieve Authentication Token from javascript script response

I need help about this specific scenario.我需要有关此特定情况的帮助。

Scenario设想

  1. Calling site呼叫站点

http://www.example.com/index.php http://www.example.com/index.php

I can get this information from <script> tag我可以从<script>标签获取这些信息

https://www.example.com/anotherpage.php?key=ABCDFG https://www.example.com/anotherpage.php?key=ABCDFG

using the key, I have to call this endpoint使用密钥,我必须调用此端点

https://www.example.com/login.php?key=ABCD https://www.example.com/login.php?key=ABCD

for retrieving the SessionID which is stored inside the javascript response用于检索存储在 javascript 响应中的 SessionID

-- omitted

private._sessID='MYSESSIONID';

-- omitted

At the end, using this sessionId and performing right POST actions, I can navigate inside all pages I need.最后,使用这个 sessionId 并执行正确的 POST 操作,我可以在我需要的所有页面中导航。

My stalemate我的僵局

I'm able to simulate all steps using scrapy shell with regEx (and all work fine), but I don't know how to manage these steps inside a scrapy spider before starting data extraction.我可以使用带有正则表达式的scrapy shell regEx所有步骤(并且一切正常),但我不知道如何在开始数据提取之前在 Z3CD13A277FBC2FEA5EF64364C8B6F85Z 蜘蛛中管理这些步骤。

Can someone help me out?有人可以帮我吗?

You need to start with base URL http://www.example.com/index.php by calling it in start request method and write its callback and extract information from other endpoint and take that result into other callback and then you can start scraping process.您需要从基础URL http://www.example.com/index.php开始,通过调用它在启动请求方法中调用它并写入其其他回调并从其他端点提取信息过程。

You need to implement in the following way您需要通过以下方式实现

class CrawlSpider(scrapy.CrawlSpider):

   def parse_authentication_token(self, response):
      //extract token or whatever require and then call supers parse
      yield from super().parse()

   def start_request(self):
       return Request(url, callback=self.parse_authentication_token)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM