简体   繁体   English

重置php服务器浏览器cookie

[英]Reset php server browser cookies

I'm using file_get_html to scrape a website but if I made a lot of request I get "banned" and if I want to regain access I need to click on some anti-bot thing and it will unlock the site but just for the browser/pc where I do this. 我正在使用file_get_html抓取网站,但是如果我提出很多要求,我将被“禁止”,如果我想重新获得访问权限,则需要单击一些反机器人程序,它将解锁该网站,但仅用于浏览器/ pc在这里。 My question is there any way to unblock the php browser(maybe copy cookies from another browser?) without rebooting the router(to get another ip adress). 我的问题是有什么方法可以解除php浏览器的阻止(也许从另一个浏览器复制cookie?)而无需重新启动路由器(获取另一个ip地址)。

ANSWER FOR ORIGINAL QUESTION: 原始问题的答案:

I did have a solution which I've researched before. 我确实有一个以前研究过的解决方案。 Though it's not that 100% legitimate. 虽然不是100%合法。 However the action you're doing is also not very legit....So I think it might be fine. 但是,您正在执行的操作也不是很合法。...所以我认为可能不错。

Also the anti-bot they blocks you based on the IP, clearing the cookie does nothing to bypass. 此外,它们还会根据IP阻止您访问的反漫游器,清除cookie不会绕开任何操作。 You will still get the captcha page even you cleared the cookie. 即使您清除了cookie,您仍然会获得验证码页面。

First you know that the anti-bot site gives google-recaptcha right? 首先,您知道反机器人网站赋予google-recaptcha权利了吗?

So you can make a javascript to detect the google-recaptcha div(my case I use jQuery), if gcaptcha div detected then use a third-party captcha solving API. 因此,您可以制作一个JavaScript来检测google-recaptcha div(我的情况是使用jQuery),如果检测到gcaptcha div,则可以使用第三方验证码解决方案API。 In my case I use 2captcha --> send them captcha id with your own proxy(for gcaptcha a proxy is necessary, you can host a proxy with squid proxy server) --> then they send you a g-response --> you post the g-response back to the site --> unblock 在我的情况下,我使用2captcha- >用您自己的代理发送给他们captcha id(对于gcaptcha代理是必要的,您可以用squid代理服务器托管一个代理)->然后他们向您发送g响应->您将g-响应发布回网站->取消阻止

The price is very-very-very cheap, for me is affordable so I treat this as a good solution. 价格非常非常便宜,对我来说价格是可以承受的,因此我将其视为一个很好的解决方案。

I can't post you my script due to restrictions, however you can find some example on their website. 由于限制,我无法向您发布脚本,但是您可以在其网站上找到一些示例。

NEW EDIT : FOR YOUR SECOND QUESTION ASKED IN COMMENT: 新编辑:对于您的第二个问题,提出了以下评论:

YOUR QUESTION: So basically from what I understand I can get that captcha-id(send to myself) and solve it (like 2captcha.com people) then scrape script will work again. 您的问题:因此,基本上,据我了解,我可以获取该验证码ID(发送给自己)并解决(例如2captcha.com人),然后抓取脚本将再次起作用。 I think it's just basic php and html to do this, if you have any hints I'm glad to hear. 我认为这只是基本的php和html,如果您有任何提示,我很高兴听到。 Thank you! 谢谢!

To achieve this, I suggest you open any login page which has recaptcha with browser's built-in developer tool, inspect a little bit before writing any code. 为此,我建议您使用浏览器的内置开发人员工具打开所有具有重新输入代码的登录页面,在编写任何代码之前进行一些检查。 Personally I use Opera Browser.... 我个人使用Opera Opera。

  1. Open a login page with browser which has recaptcha 使用具有Recaptcha的浏览器打开登录页面
  2. Browser->right click->Inspect elements->Open Network Tab 浏览器->右键单击->检查元素->打开网络标签
  3. Now check the checkbox "I am not a robot", you'll see two HTTP POST have been done, find the one with the URL https://www.google.com/recaptcha/api2/userverify?k=SITE_KEY_HERE , look at the response part, you'll see google responded you a json object, something like {"uvresp":"A_LONG_STRING_HERE_blablablabla", ,} A_LONG_STRING_HERE_blablablabla is exactly what we need 现在选中“我不是机器人”复选框,您将看到完成了两个HTTP POST,找到一个URL为https://www.google.com/recaptcha/api2/userverify?k=SITE_KEY_HERE的URL,在响应部分,您会看到Google向您响应了一个json对象,类似于{“ uvresp”:“ A_LONG_STRING_HERE_blablablabla”,,} A_LONG_STRING_HERE_blablablabla正是我们需要的
  4. Now you enter anything at login and password, press login and look at the network again, you'll find not only username and password, a pair of value also being POSTed to server, which is g-recaptcha-response=A_LONG_STRING_HERE_blablablabla . 现在,您在登录名和密码中输入任何内容,然后按登录并再次查看网络,您不仅会找到用户名和密码,还会将一对值POST张贴到服务器,即g-recaptcha-response=A_LONG_STRING_HERE_blablablabla So whenever a recaptcha appears, post g-recaptcha-response will pass the verification. 因此,每当出现recapcha时,post g-recaptcha-response都会通过验证。

Now here's some suggestions to your code. 现在,这是您的代码的一些建议。

for php server side: 对于php服务器端:

After using file_get_html, detect some <div> in recaptcha, such as <div class="rc-anchor-content"> . 使用file_get_html之后,在Recaptcha中检测一些<div> ,例如<div class="rc-anchor-content"> if captcha detected, stop any scraping and wait for an answer, display a page with an input parameter which you're going to input g-recaptcha-response and a submit button. 如果检测到验证码,请停止抓取并等待答案,显示包含输入参数的页面,您将输入g-recaptcha-response和一个提交按钮。

note: If you try to copy all the elements of <div class="g-recaptcha"> and display it on your website, you might get an "site-key" wrong error, because the captcha itself also detects where was it displayed from(url). 注意:如果您尝试复制<div class="g-recaptcha">所有元素并将其显示在您的网站上,则可能会收到“ site-key”错误消息,因为验证码本身也会检测到显示在哪里从(URL)。 However you might be able to trick it if you use some javascript?(or you can try the none javascript version of recaptcha?disable javascript and you'll see one, not sure how it works) 2captcha.com might have done this because I've seen a recaptcha displayed on their worker page before, you can go register a worker account and inspect their worker page. 但是,如果您使用一些javascript,您也许可以欺骗它?(或者您可以尝试使用非javascript版本的recaptcha?disable javascript,并且会看到一个,不确定它是如何工作的)2captcha.com可能是这样做的,因为我您之前已经在其工作人员页面上看到了一个recaptcha,您可以注册一个工作人员帐户并检查其工作人员页面。

for what you need to do on desktop: 您需要在台式机上执行的操作:

Open target website with normal browser(must have the same ip with php browser, can use proxy), click on the checkbox then copy the json response from google. 使用正常的浏览器打开目标网站(必须与php浏览器具有相同的ip,才能使用代理),单击复选框,然后从Google复制json响应。 submit this string to your php server, remember a g-response session only valid in 3-5 minutes,it will expire. 将此字符串提交到您的php服务器,记住一个g响应会话仅在3-5分钟内有效,它将过期。

back to php browser: 回到PHP浏览器:

php server received your g-recaptcha-response srting, POST it to target website(dont forget other post values(if exist)), unblock php服务器收到您的g-recaptcha-response发送,将其发布到目标网站(不要忘记其他发布值(如果存在)),取消阻止

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM