简体   繁体   中英

Can't fetch tabular content from a webpage using requests

I would like to scrape tabular content from the landing page of this website . There are 100 rows in it's first page. When I observe network activity in dev tools, I could notice that some get requests is being issued to this url https://io6.dexscreener.io/u/ws3/screener3/ with appropriate parameters which ends up producing json content.

However, when I try to mimic that requests through my following efforts:

import requests

url = 'https://io6.dexscreener.io/u/ws3/screener3/'
params = {
    'EIO': '4',
    'transport': 'polling',
    't': 'NwYSrFK',
    'sid': 'ztAOHWOb-1ulTq-0AQwi',
}

headers = {
    'accept': '*/*',
    'referer': 'https://dexscreener.com/',
    'user-agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36'
}
with requests.Session() as s:
    s.headers.update(headers)
    res = s.get(url,params=params)
    print(res.content)

I get this response:

`{"code":3,"message":"Bad request"}`

How can I get response having tabular content from that webpage?

Here is a very quick and dirty piece of python code that does the initial handshake and sets up the websocket connection and downloads the data in json format infinitely. I haven't tested this code extensively and I am not sure exactly what is necessary or not (in terms of the steps in the handshake) but I have mimicked the browser behaviour and it seems to work fine:

import requests
from websocket import create_connection
import json

s = requests.Session()

headers =   {'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36'}
url = 'https://dexscreener.com/ethereum'

resp = s.get(url,headers=headers)
print(resp)

step1 = s.get('https://io3.dexscreener.io/u/ws3/screener3/?EIO=4&transport=polling&t=Nwof-Os')
step2 = s.get('https://io4.dexscreener.io/u/ws3/screener3/?EIO=4&transport=polling&t=Nwof-S5')

obj = json.loads(step2.text[1:])
code = obj['sid']

payload = '40/u/ws/screener/consolidated/platform/ethereum/h1/top/1,'

step3 = s.post(f'https://io4.dexscreener.io/u/ws3/screener3/?EIO=4&transport=polling&t=Nwof-Xt&sid={code}',data=payload)
step4 = s.get(f'https://io4.dexscreener.io/u/ws3/screener3/?EIO=4&transport=polling&t=Nwof-Xu&sid={code}')
d = step4.text.replace('','').replace('42/u/ws/screener/consolidated/platform/ethereum/h1/top/1,','').replace(payload,'')

start = '["screener",'
end = ']["latestBlock",'

dirty = d[d.find(start)+len(start):d.rfind(end)].strip()
clean = json.loads(dirty)
print(clean)

# Initialize the headers needed for the websocket connection
headers = json.dumps({
    'Accept-Encoding':'gzip, deflate, br',
    'Accept-Language':'en-ZA,en;q=0.9,en-GB;q=0.8,en-US;q=0.7,de;q=0.6',
    'Cache-Control':'no-cache',
    'Connection':'Upgrade',
    'Host':'io3.dexscreener.io',
    'Origin':'https://dexscreener.com',
    'Pragma':'no-cache',
    'Sec-WebSocket-Extensions':'permessage-deflate; client_max_window_bits',
    'Sec-WebSocket-Key':'ssklBDKxAOUt3D47SoEttQ==',
    'Sec-WebSocket-Version':'13',
    'Upgrade':'websocket',
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36'
    })

# Then create a connection to the tunnel
ws = create_connection(f"wss://io4.dexscreener.io/u/ws3/screener3/?EIO=4&transport=websocket&sid={code}",headers=headers)

# Then send the initial messages through the tunnel
ws.send('2probe')
ws.send('5')

# Here you will view the message return from the tunnel
while True:
    try:
        json_data = json.loads(ws.recv().replace('42/u/ws/screener/consolidated/platform/ethereum/h1/top/1,',''))
        print(json_data)
    except:
        pass

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM