简体   繁体   中英

Different results between requests.get and aiohttp GET and Httpx module

I am trying to access a site with a bot prevention.

with the following script using requests I can access the site.

request = requests.get(url,headers={**HEADERS,'Cookie': cookies})

and I am getting the desired HTML. but when I use aiohttp

async def get_data(session: aiohttp.ClientSession,url,cookies):
    async with session.get(url,timeout = 5,headers={**HEADERS,'Cookie': cookies}) as response:
        text = await response.text()
        print(text)

I am getting as a response the bot prevention page.

This is the headers I use for all the requests.

HEADERS = {
    'User-Agent': 'PostmanRuntime/7.29.0',
    'Host': 'www.dnb.com',
    'Connection': 'keep-alive',
    'Accept': '/',
    'Accept-Encoding': 'gzip, deflate, br'
} 

I have compared the requests headers both of requests.get and aiohttp and they are identical.

is there any reason the results are different? if so why?

EDIT: I've checked the httpx module, the problem occurs there aswell both with httpx.Client() and httpx.AsyncClient() .

response = httpx.request('GET',url,headers={**HEADERS,'Cookie':cookies})

doesn't work as well. (not asyncornic)

I tried capturing packets with wireshark to compare requests and aiohttp.

Server:

    import http
    server = http.server.HTTPServer(("localhost", 8080), 
    http.server.SimpleHTTPRequestHandler)
    server.serve_forever()

with requests:

    import requests
    url = 'http://localhost:8080'
    HEADERS = {'Content-Type': 'application/json'}
    cookies = ''
    request = requests.get(url,headers={**HEADERS,'Cookie': cookies})

requests packet:

    GET / HTTP/1.1
    Host: localhost:8080
    User-Agent: python-requests/2.27.1
    Accept-Encoding: gzip, deflate, br
    Accept: */*
    Connection: keep-alive
    Content-Type: application/json
    Cookie: 

with aiohttp:

    import aiohttp
    import asyncio
    
    url = 'http://localhost:8080'
    HEADERS = {'Content-Type': 'application/json'}
    cookies = ''
    async def get_data(session: aiohttp.ClientSession,url,cookies):
        async with session.get(url,timeout = 5,headers={**HEADERS,'Cookie': cookies}) as response:
            text = await response.text()
            print(text)
    
    async def main():
        async with aiohttp.ClientSession() as session:
            await get_data(session,url,cookies)
    
    asyncio.run(main())

aiohttp packet:

    GET / HTTP/1.1
    Host: localhost:8080
    Content-Type: application/json
    Cookie: 
    Accept: */*
    Accept-Encoding: gzip, deflate
    User-Agent: Python/3.10 aiohttp/3.8.1

If the site seems to accept packets from requests, then you could try making the aiohttp packet identical by setting the headers:

    HEADERS = { 'User-Agent': 'python-requests/2.27.1','Accept-Encoding': 'gzip, deflate, br', 'Accept': '*/*', 'Connection': 'keep-alive', 'Content-Type': 'application/json','Cookie': ''}

If you haven't already, I suggest capturing the request with wireshark to make sure aiohttp isn't messing with your headers.

You can also try other user agent strings too, or try the headers in different orders. The order is not supposed to matter, but some sites check it anyway for bot protection (for example in this question ).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM