I would like to scrape tabular content from the landing page of this website . There are 100 rows in it's first page. When I observe network activity in dev tools, I could notice that some get requests is being issued to this url https://io6.dexscreener.io/u/ws3/screener3/
with appropriate parameters which ends up producing json content.
However, when I try to mimic that requests through my following efforts:
import requests
url = 'https://io6.dexscreener.io/u/ws3/screener3/'
params = {
'EIO': '4',
'transport': 'polling',
't': 'NwYSrFK',
'sid': 'ztAOHWOb-1ulTq-0AQwi',
}
headers = {
'accept': '*/*',
'referer': 'https://dexscreener.com/',
'user-agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36'
}
with requests.Session() as s:
s.headers.update(headers)
res = s.get(url,params=params)
print(res.content)
I get this response:
`{"code":3,"message":"Bad request"}`
How can I get response having tabular content from that webpage?
Here is a very quick and dirty piece of python code that does the initial handshake and sets up the websocket connection and downloads the data in json format infinitely. I haven't tested this code extensively and I am not sure exactly what is necessary or not (in terms of the steps in the handshake) but I have mimicked the browser behaviour and it seems to work fine:
import requests
from websocket import create_connection
import json
s = requests.Session()
headers = {'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36'}
url = 'https://dexscreener.com/ethereum'
resp = s.get(url,headers=headers)
print(resp)
step1 = s.get('https://io3.dexscreener.io/u/ws3/screener3/?EIO=4&transport=polling&t=Nwof-Os')
step2 = s.get('https://io4.dexscreener.io/u/ws3/screener3/?EIO=4&transport=polling&t=Nwof-S5')
obj = json.loads(step2.text[1:])
code = obj['sid']
payload = '40/u/ws/screener/consolidated/platform/ethereum/h1/top/1,'
step3 = s.post(f'https://io4.dexscreener.io/u/ws3/screener3/?EIO=4&transport=polling&t=Nwof-Xt&sid={code}',data=payload)
step4 = s.get(f'https://io4.dexscreener.io/u/ws3/screener3/?EIO=4&transport=polling&t=Nwof-Xu&sid={code}')
d = step4.text.replace('','').replace('42/u/ws/screener/consolidated/platform/ethereum/h1/top/1,','').replace(payload,'')
start = '["screener",'
end = ']["latestBlock",'
dirty = d[d.find(start)+len(start):d.rfind(end)].strip()
clean = json.loads(dirty)
print(clean)
# Initialize the headers needed for the websocket connection
headers = json.dumps({
'Accept-Encoding':'gzip, deflate, br',
'Accept-Language':'en-ZA,en;q=0.9,en-GB;q=0.8,en-US;q=0.7,de;q=0.6',
'Cache-Control':'no-cache',
'Connection':'Upgrade',
'Host':'io3.dexscreener.io',
'Origin':'https://dexscreener.com',
'Pragma':'no-cache',
'Sec-WebSocket-Extensions':'permessage-deflate; client_max_window_bits',
'Sec-WebSocket-Key':'ssklBDKxAOUt3D47SoEttQ==',
'Sec-WebSocket-Version':'13',
'Upgrade':'websocket',
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36'
})
# Then create a connection to the tunnel
ws = create_connection(f"wss://io4.dexscreener.io/u/ws3/screener3/?EIO=4&transport=websocket&sid={code}",headers=headers)
# Then send the initial messages through the tunnel
ws.send('2probe')
ws.send('5')
# Here you will view the message return from the tunnel
while True:
try:
json_data = json.loads(ws.recv().replace('42/u/ws/screener/consolidated/platform/ethereum/h1/top/1,',''))
print(json_data)
except:
pass
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.