简体   繁体   中英

Selenium gets response code of 429 but firefox private mode does not

Used Selenium in python3 to open a page. It does not open under selenium but it does open under firefox private page.

What is the difference and how to fix it?

from selenium import webdriver
from time import sleep

driver = webdriver.Firefox()
driver.get('https://google.com') # creating a google cookie
driver.get_cookies() # check google gets cookies
sleep(3.0)
url='https://www.realestate.com.au/buy/in-sydney+cbd%2c+nsw/list-1'
driver.get(url)

Creating a google cookie is not necessary. It is not there under firefox private page either but it works without it. However, under Selenium the behavior is different.

I also see the website returns [HTTP/2 429 Too Many Requests 173ms] status and the page is blank white. It does not happen in firefox private mode.

UPDATE:

I turned on the persistent log. Firefox on private mode will receive a 429 response too but it seems the javascript will resume from another url. It only happens for the first time.

On selenium however, the request does not survive the 429 response. It does report something to cdndex website. I have blocked that website so you o not see the request go through there. This is still a different behavior between firefox and selenium.

Selenium with persistent log: 硒网络

Firefox with persistent log: 火狐网络

This is just my huch after working with selenium and webdriver for a while; I suspect that it is due to the default user agent of selenium being set to something lame by default and that the server side recognizes this and provides you with a silly HTTP code and a blank page as a result.

Try setting the user agent to something reasonable and/or disable selenium's interfering with defaults.

Another tips is to look at the request using wireshark or similar to see exactly what is sent over the wire.

429 Too Many Requests

The HTTP 429 Too Many Requests response status code indicates the user has sent too many requests within a short period of time. The 429 status code is intended for use with rate-limiting schemes.


Root Cause

When your server detects that a user agent is trying to access a specific page too often in a short period of time, it triggers a rate-limiting feature. The most common example of this is when a user (or an attacker) repeatedly tries to log into a web application.

The server can also identify a with cookies, rather than by their login credentials. Requests may also be counted on a per-request basis, across your server, or across several servers. So there are a variety of situations that can result in you seeing an error like one of these:

  • 429 Too Many Requests
  • 429 Error
  • HTTP 429
  • Error 429 (Too Many Requests)

This usecase

This usecase seems to be a classical case of Selenium driven GeckoDriver initiated Browsing Context getting detected as a bot due to the fact:

Selenium identifies itself


References

You can find a couple of relevant detailed discussions in:

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM