简体   繁体   中英

Missing __RequestVerificationToken to bypass google captcha with external solver

I'm trying to scrape some sites that require to solve a captcha for login, the best way I have found to do it so is to use an external service like https://anti-captcha.com/ which have a person on the other site solving the captchas and sends back a hash value to verify the result.

As in the documentation, the process is:

  1. I send the site url and sitekey

在此处输入图片说明

  1. They assign the task to some worker and after a one or two mins I get back the hash value to login

The issue is That the actual request I need to make requires two other values besides that one:

* __RequestVerificationToken: This one appears on the login page: 

在此处输入图片说明

But the value that is sent through the login request is different, so there is some work in the middle that I'm missing

* RecaptchaToken: There is no trace of this value in the login site, I'm suspecting it may be generated in the back end as an additional verification step, but I have not found any information about it.

My last concern regarding this process, is that the anti-captcha service seems to be solving some generic captcha, and not the same that I'm seem, not sure if that is an actual issue though.

I believe that you are talking about reCAPTCHA v2, which ask the user to select certain images with some object in it.

how it works:

based on the documentation after the user solve the recaptcha images puzzle, he clicks "verify", this sends a post request to google api, to this url: https://www.google.com/recaptcha/api/siteverify with the user response to the puzzle -encoded ofc- and it gets a response, called "g-recaptcha-response" which is used to identify if the user response/solution to the puzzle is correct or not.

so mainly standard recaptcha v2, only need 1 token to validate the user response, but this is not the case you are facing here, you are facing a custom implementation that intended especially to make it harder to scrape or crawl these sites by unwanted parties.

they have developed 2 extra tokens that are uniquely generated and injected to the page that shows the captcha puzzle, and by sending those extra tokens they are making sure that the "g-recaptcha-response" is coming from the same page that the user already loaded in his browser.

you need to inject the g-recaptcha-response you have from this api that solves the recaptcha for you in the same page you are visiting, then simulate the complete user interaction with the page.

I recommend you to use selenium , it will help you automate all user actions and also inject everything you need to the page DOM.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM