简体   繁体   中英

How can I manually authenticate before scrapy runs?

I want to scrape a web page that uses a ridiculous quantity of captcha challenges before I can login (eg more than 20 challenges in sequence).

How can I login, by me solving the captcha, with my physical hands, ie not with Selenium etc., and then have the web scraping run. I have tried finding code that does the same in Scrapy documentation, tutorials and web searching and found nothing.

Obligatory code that doesn't do the thing that I am asking how to do:

import scrapy

class BadSpider(scrapy.Spider):
    name = "bad"

    def start_requests(self):
        [...]

    def parse(self, response):
        if (response.url.endswith('/login')):
            print('!!!!! I have no idea what to do here!!!!')
        else:
            [...]

I want it to start after I have manually authenticated. But, instead it starts and I have not logged in so I can not go further.

  1. You just authenticate manually in your browser
  2. Then open DevTools of your browser
  3. Navigate to Network tab
  4. Re-load the page you want to scrape
  5. Then inside the Network tab, right-click on the first request and look for Copy as cURL (bash) option
  6. Go to https://curl.trillworks.com/ and paste your code
  7. Copy headers and cookies and boom you are done

PS: I would suggest perform this action in Mozilla Firefox, because sometimes Chrome's DevTools produces incorrect results in https://curl.trillworks.com/

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM