如何在scrapy运行之前手动进行身份验证？

Question

I want to scrape a web page that uses a ridiculous quantity of captcha challenges before I can login (eg more than 20 challenges in sequence). 我想在我登录之前抓取一个使用可笑的验证码数量的网页（例如顺序超过20个挑战）。

How can I login, by me solving the captcha, with my physical hands, ie not with Selenium etc., and then have the web scraping run. 如何解决物理验证码，如何用我的双手（即不使用Selenium等）登录，然后运行网络抓取。 I have tried finding code that does the same in Scrapy documentation, tutorials and web searching and found nothing. 我尝试过在Scrapy文档，教程和网络搜索中找到与上述代码相同的代码，却一无所获。

Obligatory code that doesn't do the thing that I am asking how to do: 强制性代码没有执行我要问的事情：

import scrapy

class BadSpider(scrapy.Spider):
    name = "bad"

    def start_requests(self):
        [...]

    def parse(self, response):
        if (response.url.endswith('/login')):
            print('!!!!! I have no idea what to do here!!!!')
        else:
            [...]

I want it to start after I have manually authenticated. 我希望它在手动验证后启动。 But, instead it starts and I have not logged in so I can not go further. 但是，它开始了，但是我还没有登录，所以我不能再走了。

Answer 1

You just authenticate manually in your browser 您只需在浏览器中手动进行身份验证
Then open DevTools of your browser 然后打开浏览器的DevTools
Navigate to Network tab 导航到网络选项卡
Re-load the page you want to scrape 重新加载您要抓取的页面
Then inside the Network tab, right-click on the first request and look for Copy as cURL (bash) option 然后在“网络”标签内，右键单击第一个请求，然后查找“ Copy as cURL (bash)选项
Go to https://curl.trillworks.com/ and paste your code 转到https://curl.trillworks.com/并粘贴您的代码
Copy headers and cookies and boom you are done 复制标题和cookie，然后完成工作

PS: I would suggest perform this action in Mozilla Firefox, because sometimes Chrome's DevTools produces incorrect results in https://curl.trillworks.com/ PS：我建议在Mozilla Firefox中执行此操作，因为有时Chrome的DevTools在https://curl.trillworks.com/中会产生错误的结果

如何在scrapy运行之前手动进行身份验证？

问题描述

1 个解决方案

解决方案1
1 2019-06-12 06:44:01

如何在scrapy运行之前手动进行身份验证？

问题描述

1 个解决方案

解决方案1 1 2019-06-12 06:44:01

解决方案1
1 2019-06-12 06:44:01