How can I manually authenticate before scrapy runs?

Question

I want to scrape a web page that uses a ridiculous quantity of captcha challenges before I can login (eg more than 20 challenges in sequence).

How can I login, by me solving the captcha, with my physical hands, ie not with Selenium etc., and then have the web scraping run. I have tried finding code that does the same in Scrapy documentation, tutorials and web searching and found nothing.

Obligatory code that doesn't do the thing that I am asking how to do:

import scrapy

class BadSpider(scrapy.Spider):
    name = "bad"

    def start_requests(self):
        [...]

    def parse(self, response):
        if (response.url.endswith('/login')):
            print('!!!!! I have no idea what to do here!!!!')
        else:
            [...]

I want it to start after I have manually authenticated. But, instead it starts and I have not logged in so I can not go further.

Answer 1

You just authenticate manually in your browser
Then open DevTools of your browser
Navigate to Network tab
Re-load the page you want to scrape
Then inside the Network tab, right-click on the first request and look for Copy as cURL (bash) option
Go to https://curl.trillworks.com/ and paste your code
Copy headers and cookies and boom you are done

PS: I would suggest perform this action in Mozilla Firefox, because sometimes Chrome's DevTools produces incorrect results in https://curl.trillworks.com/

How can I manually authenticate before scrapy runs?

Question

1 answers

solution1
1 2019-06-12 06:44:01

How can I manually authenticate before scrapy runs?

Question

1 answers

solution1 1 2019-06-12 06:44:01

solution1
1 2019-06-12 06:44:01