简体   繁体   English

Selenium 得到响应码 429 但 firefox 私有模式没有

[英]Selenium gets response code of 429 but firefox private mode does not

Used Selenium in python3 to open a page.在python3中使用Selenium打开一个页面。 It does not open under selenium but it does open under firefox private page.它不在 selenium 下打开,但在 firefox 私人页面下打开。

What is the difference and how to fix it?有什么区别以及如何解决?

from selenium import webdriver
from time import sleep

driver = webdriver.Firefox()
driver.get('https://google.com') # creating a google cookie
driver.get_cookies() # check google gets cookies
sleep(3.0)
url='https://www.realestate.com.au/buy/in-sydney+cbd%2c+nsw/list-1'
driver.get(url)

Creating a google cookie is not necessary.不需要创建 google cookie。 It is not there under firefox private page either but it works without it.它也不存在于 firefox 私人页面下,但没有它也可以工作。 However, under Selenium the behavior is different.但是,在 Selenium 下,行为不同。

I also see the website returns [HTTP/2 429 Too Many Requests 173ms] status and the page is blank white.我还看到网站返回[HTTP/2 429 Too Many Requests 173ms]状态并且页面是空白的白色。 It does not happen in firefox private mode.在 firefox 私有模式下不会发生这种情况。

UPDATE:更新:

I turned on the persistent log.我打开了持久日志。 Firefox on private mode will receive a 429 response too but it seems the javascript will resume from another url.私有模式下的 Firefox 也会收到 429 响应,但似乎 javascript 将从另一个 url 恢复。 It only happens for the first time.它只是第一次发生。

On selenium however, the request does not survive the 429 response.然而,在 selenium 上,请求无法在 429 响应中存活。 It does report something to cdndex website.它确实向 cdndex 网站报告了一些内容。 I have blocked that website so you o not see the request go through there.我已经封锁了那个网站,所以你看不到通过那里的请求 go。 This is still a different behavior between firefox and selenium.这仍然是 firefox 和 selenium 之间的不同行为。

Selenium with persistent log: Selenium 与持久日志: 硒网络

Firefox with persistent log: Firefox 与持久日志: 火狐网络

This is just my huch after working with selenium and webdriver for a while;这只是我在使用 selenium 和 webdriver 一段时间后的结果; I suspect that it is due to the default user agent of selenium being set to something lame by default and that the server side recognizes this and provides you with a silly HTTP code and a blank page as a result.我怀疑这是由于 selenium 的默认用户代理默认设置为蹩脚的东西,并且服务器端认识到这一点并为您提供了一个愚蠢的 HTTP 代码和一个空白页作为结果。

Try setting the user agent to something reasonable and/or disable selenium's interfering with defaults.尝试将用户代理设置为合理的值和/或禁用 selenium 对默认值的干扰。

Another tips is to look at the request using wireshark or similar to see exactly what is sent over the wire.另一个技巧是使用 wireshark 或类似工具查看请求,以准确了解通过网络发送的内容。

429 Too Many Requests 429 请求过多

The HTTP 429 Too Many Requests response status code indicates the user has sent too many requests within a short period of time. HTTP 429 Too Many Requests响应状态码表示用户在短时间内发送了太多请求。 The 429 status code is intended for use with rate-limiting schemes. 429 状态代码旨在与速率限制方案一起使用。


Root Cause根本原因

When your server detects that a user agent is trying to access a specific page too often in a short period of time, it triggers a rate-limiting feature.当您的服务器检测到用户代理在短时间内过于频繁地尝试访问特定页面时,它会触发速率限制功能。 The most common example of this is when a user (or an attacker) repeatedly tries to log into a web application.最常见的示例是用户(或攻击者)反复尝试登录 web 应用程序。

The server can also identify a with cookies, rather than by their login credentials.服务器还可以识别具有 cookies 的,而不是通过其登录凭据。 Requests may also be counted on a per-request basis, across your server, or across several servers.请求也可以基于每个请求、跨您的服务器或跨多个服务器计算。 So there are a variety of situations that can result in you seeing an error like one of these:因此,有多种情况会导致您看到如下错误之一:

  • 429 Too Many Requests 429 请求过多
  • 429 Error 429 错误
  • HTTP 429 HTTP 429
  • Error 429 (Too Many Requests)错误 429(请求过多)

This usecase这个用例

This usecase seems to be a classical case of Selenium driven GeckoDriver initiated Browsing Context getting detected as a bot due to the fact:这个用例似乎是Selenium驱动的GeckoDriver启动浏览上下文检测为机器人的经典案例,原因如下:

Selenium identifies itself Selenium 标识自己


References参考

You can find a couple of relevant detailed discussions in:您可以在以下位置找到一些相关的详细讨论:

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM