I'm trying to scrape some sites that require to solve a captcha for login, the best way I have found to do it so is to use an external service like https://anti-captcha.com/ which have a person on the other site solving the captchas and sends back a hash value to verify the result.
As in the documentation, the process is:
The issue is That the actual request I need to make requires two other values besides that one:
* __RequestVerificationToken: This one appears on the login page:
But the value that is sent through the login request is different, so there is some work in the middle that I'm missing
* RecaptchaToken: There is no trace of this value in the login site, I'm suspecting it may be generated in the back end as an additional verification step, but I have not found any information about it.
My last concern regarding this process, is that the anti-captcha service seems to be solving some generic captcha, and not the same that I'm seem, not sure if that is an actual issue though.
I believe that you are talking about reCAPTCHA v2, which ask the user to select certain images with some object in it.
based on the documentation after the user solve the recaptcha images puzzle, he clicks "verify", this sends a post request to google api, to this url: https://www.google.com/recaptcha/api/siteverify with the user response to the puzzle -encoded ofc- and it gets a response, called "g-recaptcha-response" which is used to identify if the user response/solution to the puzzle is correct or not.
so mainly standard recaptcha v2, only need 1 token to validate the user response, but this is not the case you are facing here, you are facing a custom implementation that intended especially to make it harder to scrape or crawl these sites by unwanted parties.
they have developed 2 extra tokens that are uniquely generated and injected to the page that shows the captcha puzzle, and by sending those extra tokens they are making sure that the "g-recaptcha-response" is coming from the same page that the user already loaded in his browser.
you need to inject the g-recaptcha-response you have from this api that solves the recaptcha for you in the same page you are visiting, then simulate the complete user interaction with the page.
I recommend you to use selenium , it will help you automate all user actions and also inject everything you need to the page DOM.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.