简体   繁体   中英

Reset php server browser cookies

I'm using file_get_html to scrape a website but if I made a lot of request I get "banned" and if I want to regain access I need to click on some anti-bot thing and it will unlock the site but just for the browser/pc where I do this. My question is there any way to unblock the php browser(maybe copy cookies from another browser?) without rebooting the router(to get another ip adress).

ANSWER FOR ORIGINAL QUESTION:

I did have a solution which I've researched before. Though it's not that 100% legitimate. However the action you're doing is also not very legit....So I think it might be fine.

Also the anti-bot they blocks you based on the IP, clearing the cookie does nothing to bypass. You will still get the captcha page even you cleared the cookie.

First you know that the anti-bot site gives google-recaptcha right?

So you can make a javascript to detect the google-recaptcha div(my case I use jQuery), if gcaptcha div detected then use a third-party captcha solving API. In my case I use 2captcha --> send them captcha id with your own proxy(for gcaptcha a proxy is necessary, you can host a proxy with squid proxy server) --> then they send you a g-response --> you post the g-response back to the site --> unblock

The price is very-very-very cheap, for me is affordable so I treat this as a good solution.

I can't post you my script due to restrictions, however you can find some example on their website.

NEW EDIT : FOR YOUR SECOND QUESTION ASKED IN COMMENT:

YOUR QUESTION: So basically from what I understand I can get that captcha-id(send to myself) and solve it (like 2captcha.com people) then scrape script will work again. I think it's just basic php and html to do this, if you have any hints I'm glad to hear. Thank you!

To achieve this, I suggest you open any login page which has recaptcha with browser's built-in developer tool, inspect a little bit before writing any code. Personally I use Opera Browser....

  1. Open a login page with browser which has recaptcha
  2. Browser->right click->Inspect elements->Open Network Tab
  3. Now check the checkbox "I am not a robot", you'll see two HTTP POST have been done, find the one with the URL https://www.google.com/recaptcha/api2/userverify?k=SITE_KEY_HERE , look at the response part, you'll see google responded you a json object, something like {"uvresp":"A_LONG_STRING_HERE_blablablabla", ,} A_LONG_STRING_HERE_blablablabla is exactly what we need
  4. Now you enter anything at login and password, press login and look at the network again, you'll find not only username and password, a pair of value also being POSTed to server, which is g-recaptcha-response=A_LONG_STRING_HERE_blablablabla . So whenever a recaptcha appears, post g-recaptcha-response will pass the verification.

Now here's some suggestions to your code.

for php server side:

After using file_get_html, detect some <div> in recaptcha, such as <div class="rc-anchor-content"> . if captcha detected, stop any scraping and wait for an answer, display a page with an input parameter which you're going to input g-recaptcha-response and a submit button.

note: If you try to copy all the elements of <div class="g-recaptcha"> and display it on your website, you might get an "site-key" wrong error, because the captcha itself also detects where was it displayed from(url). However you might be able to trick it if you use some javascript?(or you can try the none javascript version of recaptcha?disable javascript and you'll see one, not sure how it works) 2captcha.com might have done this because I've seen a recaptcha displayed on their worker page before, you can go register a worker account and inspect their worker page.

for what you need to do on desktop:

Open target website with normal browser(must have the same ip with php browser, can use proxy), click on the checkbox then copy the json response from google. submit this string to your php server, remember a g-response session only valid in 3-5 minutes,it will expire.

back to php browser:

php server received your g-recaptcha-response srting, POST it to target website(dont forget other post values(if exist)), unblock

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM