简体   繁体   中英

What's the difference between HTTPX and requests module in python that is causing this error

Not much to say, I'm sending a request to some url. One HTTP Client returns 403, one succeds. No difference in headers. What else can be different?

>>> print(httpx.get(url).status_code)
200
>>> print(requests.get(url).status_code)
403

My guess would be that the website is blocking requests from requests library.

How is that possible?

With each request you are leaving behind fingerprint of device that made it. This fingerprint amongst other information contains user-agent .

When you are doing requests with requests library by default user agent value is set to something like "python-requests/2.28.0" while for httpx library it's something like "python-httpx/0.23.0".

Based on information contained in your fingerprint website might choose to handle your request differently from others. For example returning error page instead of real content.

But why?

Python is often used for web scraping and requests is one of the more popular libraries. Many people will choose to use it in their project. Because of that many websites choose to not serve real content to requests with "python-requests/*" user agent.

Because handling of requests costs money, websites are trying to serve content only to real people instead of bots . Bot traffic also tends to be way bigger compared to real people. While some people just want to forbid 3rd parties to use their content in the ways they don't like or know about.

Httpx library is doing the same but it's not as wide-known as requests library so you'll find more websites which are not blocking such requests.

Is it possible to bypass blocking?

Yes, there are many online resources about the topic. Just search for something like "avoid requests blocking".

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM