[英]Extract Name Text from nested table in python using Beautiful Soup
I am relatively new to web scraping using Python, and I am having a lot of difficulty pulling the name value out of an HTML table row on CoinMarketCap.com.我对使用 Python 进行网络抓取相对较新,并且在将名称值从 CoinMarketCap.com 上的 HTML 表格行中提取出来时遇到了很多困难。 Their structure is unfamiliar to me.
他们的结构对我来说是陌生的。 I have tried several methods, both on stack overflow and on other sites, to no avail.
我在堆栈溢出和其他站点上尝试了几种方法,但均无济于事。 Here is a snippet of their html: https://i.stack.imgur.com/eBamV.png This is the code I currently have:
这是他们的 html 片段: https ://i.stack.imgur.com/eBamV.png 这是我目前拥有的代码:
import requests from bs4 import BeautifulSoup page = requests.get("https://coinmarketcap.com/rankings/exchanges/").text soup = BeautifulSoup(page, features="html.parser") tags = soup.findAll("div", class_="sc-16r8icm-0 sc-1teo54s-1 dNOTPP") tables = soup.findChildren('tr') my_table = tables[0] rows = my_table.findChildren(['td']) print(rows) for row in rows: cells = row.findChildren('td') for cell in cells: value = cell.string print("the value in this cell is %s" % value)
thanks in advance for any help!提前感谢您的帮助!
These sc-16r8icm-0 sc-1teo54s-1 dNOTPP
are three classes separated with spaces.这些
sc-16r8icm-0 sc-1teo54s-1 dNOTPP
是三个用空格分隔的类。 If you need to identify an element by multiple classes, use a selector like this如果您需要通过多个类来识别一个元素,请使用这样的选择器
tags = soup.select("div.sc-16r8icm-0.sc-1teo54s-1.dNOTPP")
The data you see is embedded within the page in Json form.您看到的数据以 Json 形式嵌入到页面中。 To parse it you could use next example:
要解析它,您可以使用下一个示例:
import json
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = "https://coinmarketcap.com/rankings/exchanges/"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
data = soup.select_one("#__NEXT_DATA__").text
data = json.loads(data)
# uncomment this to print all data:
# print(json.dumps(data, indent=4))
df = pd.json_normalize(data["props"]["initialProps"]["pageProps"]["exchange"])
print(df.head().to_markdown())
Prints:印刷:
id ![]() |
name![]() |
slug![]() |
score![]() |
countries![]() |
fiats![]() |
totalVol24h![]() |
spotVol24h![]() |
derivativesVol24h![]() |
derivativesOpenInterests![]() |
derivativesMarketPairs![]() |
totalVolChgPct24h ![]() |
totalVolChgPct7d ![]() |
visits![]() |
liquidity![]() |
numMarkets ![]() |
numCoins ![]() |
dateLaunched![]() |
lastUpdated![]() |
marketSharePct![]() |
type![]() |
makerFee![]() |
takerFee![]() |
rank![]() |
|
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 ![]() |
270 ![]() |
Binance![]() |
binance![]() |
9.9 ![]() |
[] ![]() |
['AED', 'ARS', 'AUD', 'AZN', 'BRL', 'CAD', 'CHF', 'CLP', 'COP', 'CZK', 'EGP', 'EUR', 'GBP', 'GHS', 'HKD', 'HRK', 'HUF', 'IDR', 'ILS', 'INR', 'ISK', 'JPY', 'KES', 'KRW', 'KZT', 'MXN', 'NGN', 'NOK', 'NZD', 'PEN', 'PHP', 'PLN', 'RON', 'RUB', 'SAR', 'SEK', 'SGD', 'THB', 'TRY', 'TWD', 'UAH', 'UGX', 'USD', 'UYU', 'VND', 'ZAR'] ![]() |
5.56801e+10 ![]() |
1.42812e+10 ![]() |
4.21641e+10 ![]() |
1.57537e+10 ![]() |
203 ![]() |
-20.7533 ![]() |
-65.7038 ![]() |
2.20602e+07 ![]() |
816 ![]() |
1667 ![]() |
394 ![]() |
2017-07-14T00:00:00.000Z ![]() |
2022-05-17T20:08:11.000Z ![]() |
0.0023 ![]() |
0.02 ![]() |
0.04 ![]() |
1 ![]() |
|
1 ![]() |
524 ![]() |
FTX ![]() |
ftx ![]() |
8.3819 ![]() |
[] ![]() |
['USD', 'EUR', 'GBP', 'AUD', 'HKD', 'SGD', 'ZAR', 'CAD', 'CHF', 'BRL'] ![]() |
7.57339e+09 ![]() |
2.12004e+09 ![]() |
5.61716e+09 ![]() |
3.46104e+09 ![]() |
43 ![]() |
-21.1298 ![]() |
-58.9183 ![]() |
4.71841e+06 ![]() |
722 ![]() |
466 ![]() |
326 ![]() |
2019-02-25T00:00:00.000Z ![]() |
2022-05-17T20:08:11.000Z ![]() |
0.0003 ![]() |
0.02 ![]() |
0.07 ![]() |
2 ![]() |
|
2 ![]() |
89 ![]() |
Coinbase Exchange ![]() |
coinbase-exchange![]() |
8.303 ![]() |
[] ![]() |
['USD', 'EUR', 'GBP'] ![]() |
1.80697e+09 ![]() |
1.80757e+09 ![]() |
nan![]() |
nan![]() |
nan![]() |
-13.3741 ![]() |
-68.7096 ![]() |
2.19108e+06 ![]() |
717 ![]() |
503 ![]() |
173 ![]() |
2014-05-24T00:00:00.000Z ![]() |
2022-05-17T20:08:11.000Z ![]() |
0.0003 ![]() |
0 ![]() |
0 ![]() |
3 ![]() |
|
3 ![]() |
24 ![]() |
Kraken![]() |
kraken![]() |
7.9853 ![]() |
[] ![]() |
['USD', 'EUR', 'GBP', 'CAD', 'JPY', 'CHF', 'AUD'] ![]() |
8.10391e+08 ![]() |
7.66352e+08 ![]() |
2.74902e+11 ![]() |
4.01852e+07 ![]() |
28 ![]() |
-14.7475 ![]() |
-63.5845 ![]() |
1.72099e+06 ![]() |
739 ![]() |
542 ![]() |
167 ![]() |
2011-07-28T00:00:00.000Z ![]() |
2022-05-17T20:08:11.000Z ![]() |
0.0001 ![]() |
0.02 ![]() |
0.05 ![]() |
4 ![]() |
|
4 ![]() |
311 ![]() |
KuCoin![]() |
kucoin![]() |
7.486 ![]() |
[] ![]() |
['USD', 'AED', 'ARS', 'AUD', 'AGN', 'BGN', 'BRL', 'CAD', 'CHF', 'CLP', 'COP', 'CRC', 'CZK', 'DKK', 'DOP', 'EUR', 'GBP', 'GEL', 'HKD', 'HUF', 'ILS', 'INR', 'JPY', 'KRW', 'KZT', 'MAD', 'MDL', 'MXN', 'MYR', 'NAD', 'NGN', 'NOK', 'NZD', 'PEN', 'PHP', 'PLN', 'QAR', 'RON', 'RUB', 'SEK', 'SGD', 'TRY', 'TWD', 'UAH', 'USD', 'UYU', 'UZS', 'ZAR'] ![]() |
5.17875e+09 ![]() |
1.58063e+09 ![]() |
3.61257e+09 ![]() |
9.08548e+08 ![]() |
112 ![]() |
-12.0398 ![]() |
-62.4081 ![]() |
2.55465e+06 ![]() |
547 ![]() |
1291 ![]() |
696 ![]() |
2017-08-13T00:00:00.000Z ![]() |
2022-05-17T20:08:11.000Z ![]() |
0.0002 ![]() |
0 ![]() |
0 ![]() |
5 ![]() |
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.