简体   繁体   中英

Can't scrape the value of a certain field from a webpage using requests

I'm trying to scrape the value of Balance from a webpage using requests module. I've looked for the name Balance in dev tools and in page source but found nowhere. I hope there should be any way to grab the value of Balance from that webpage without using any browser simulator.

website address

Output I'm after:

在此处输入图片说明

I've tried with:

import requests
from bs4 import BeautifulSoup

link = 'https://tronscan.org/?fbclid=IwAR2WiSKZoTDPWX1ufaAIEg9vaA5oLj9Yd_RUfpjE6MWEQKRGBaK-L_JdtwQ#/contract/TCSPn1Lbdv62QfSCczbLdwupNoCFYAfUVL'

headers = {"User-Agent":"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.111 Safari/537.36"}

res = requests.get(link,headers=headers)
soup = BeautifulSoup(res.text,'lxml')
balance = soup.select_one("li:has(> p:contains('Balance'))").get_text(strip=True)
print(balance)

The reason the page's HTML doesn't have the balance is because the page is making AJAX requests which are sending back the information you want after the page is loaded. You can look at these requests by loading up your developer window by pressing F12 in Chrome (it might be different in other browsers), go to the Network tab and you'll see this:

在此处输入图片说明

Here you can see the request that you want is account?address= followed by the code that is in the URL string for the page, and mousing over that shows the complete URL for the AJAX request, highlighted in coral, and the part of the response which holds the data you want is on the right highlighted in turquoise.

You can look at response by going here and find tokenBalances .

In order to get the balance in Python you can run the following:

import requests, json

url = 'https://apilist.tronscan.org/api/account?address=TCSPn1Lbdv62QfSCczbLdwupNoCFYAfUVL'
headers = {"User-Agent":"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.111 Safari/537.36"}

response = requests.get(url, headers=headers)
response = json.loads(response.text)

balance = response['tokenBalances'][0]['balance']

print(balance)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM