简体   繁体   English

使用 Beautiful Soup 从 python 中的嵌套表中提取名称文本

[英]Extract Name Text from nested table in python using Beautiful Soup

I am relatively new to web scraping using Python, and I am having a lot of difficulty pulling the name value out of an HTML table row on CoinMarketCap.com.我对使用 Python 进行网络抓取相对较新,并且在将名称值从 CoinMarketCap.com 上的 HTML 表格行中提取出来时遇到了很多困难。 Their structure is unfamiliar to me.他们的结构对我来说是陌生的。 I have tried several methods, both on stack overflow and on other sites, to no avail.我在堆栈溢出和其他站点上尝试了几种方法,但均无济于事。 Here is a snippet of their html: https://i.stack.imgur.com/eBamV.png This is the code I currently have:这是他们的 html 片段: https ://i.stack.imgur.com/eBamV.png 这是我目前拥有的代码:

 import requests from bs4 import BeautifulSoup page = requests.get("https://coinmarketcap.com/rankings/exchanges/").text soup = BeautifulSoup(page, features="html.parser") tags = soup.findAll("div", class_="sc-16r8icm-0 sc-1teo54s-1 dNOTPP") tables = soup.findChildren('tr') my_table = tables[0] rows = my_table.findChildren(['td']) print(rows) for row in rows: cells = row.findChildren('td') for cell in cells: value = cell.string print("the value in this cell is %s" % value)

thanks in advance for any help!提前感谢您的帮助!

These sc-16r8icm-0 sc-1teo54s-1 dNOTPP are three classes separated with spaces.这些sc-16r8icm-0 sc-1teo54s-1 dNOTPP是三个用空格分隔的类。 If you need to identify an element by multiple classes, use a selector like this如果您需要通过多个类来识别一个元素,请使用这样的选择器

tags = soup.select("div.sc-16r8icm-0.sc-1teo54s-1.dNOTPP")

The data you see is embedded within the page in Json form.您看到的数据以 Json 形式嵌入到页面中。 To parse it you could use next example:要解析它,您可以使用下一个示例:

import json
import requests
import pandas as pd
from bs4 import BeautifulSoup

url = "https://coinmarketcap.com/rankings/exchanges/"

soup = BeautifulSoup(requests.get(url).content, "html.parser")
data = soup.select_one("#__NEXT_DATA__").text
data = json.loads(data)

# uncomment this to print all data:
# print(json.dumps(data, indent=4))

df = pd.json_normalize(data["props"]["initialProps"]["pageProps"]["exchange"])

print(df.head().to_markdown())

Prints:印刷:

id ID name姓名 slug蛞蝓 score分数 countries国家 fiats法令 totalVol24h总音量 24 小时 spotVol24h现货Vol24h derivativesVol24h衍生品Vol24h derivativesOpenInterests衍生品OpenInterests derivativesMarketPairs衍生品市场对 totalVolChgPct24h totalVolChgPct24h totalVolChgPct7d totalVolChgPct7d visits访问 liquidity流动性 numMarkets numMarkets numCoins numCoins dateLaunched发布日期 lastUpdated最近更新时间 marketSharePct市场份额 type类型 makerFee创客费 takerFee收取费用 rank
0 0 270 270 Binance币安 binance币安 9.9 9.9 [] [] ['AED', 'ARS', 'AUD', 'AZN', 'BRL', 'CAD', 'CHF', 'CLP', 'COP', 'CZK', 'EGP', 'EUR', 'GBP', 'GHS', 'HKD', 'HRK', 'HUF', 'IDR', 'ILS', 'INR', 'ISK', 'JPY', 'KES', 'KRW', 'KZT', 'MXN', 'NGN', 'NOK', 'NZD', 'PEN', 'PHP', 'PLN', 'RON', 'RUB', 'SAR', 'SEK', 'SGD', 'THB', 'TRY', 'TWD', 'UAH', 'UGX', 'USD', 'UYU', 'VND', 'ZAR'] ['AED'、'ARS'、'AUD'、'AZN'、'BRL'、'CAD'、'CHF'、'CLP'、'COP'、'CZK'、'EGP'、'EUR'、' GBP'、'GHS'、'HKD'、'HRK'、'HUF'、'IDR'、'ILS'、'INR'、'ISK'、'JPY'、'KES'、'KRW'、'KZT' 、'MXN'、'NGN'、'NOK'、'NZD'、'PEN'、'PHP'、'PLN'、'RON'、'RUB'、'SAR'、'SEK'、'SGD'、' THB'、'TRY'、'TWD'、'UAH'、'UGX'、'USD'、'UYU'、'VND'、'ZAR'] 5.56801e+10 5.56801e+10 1.42812e+10 1.42812e+10 4.21641e+10 4.21641e+10 1.57537e+10 1.57537e+10 203 203 -20.7533 -20.7533 -65.7038 -65.7038 2.20602e+07 2.20602e+07 816 816 1667 1667 394 394 2017-07-14T00:00:00.000Z 2017-07-14T00:00:00.000Z 2022-05-17T20:08:11.000Z 2022-05-17T20:08:11.000Z 0.0023 0.0023 0.02 0.02 0.04 0.04 1 1
1 1 524 524 FTX FTX ftx ftx 8.3819 8.3819 [] [] ['USD', 'EUR', 'GBP', 'AUD', 'HKD', 'SGD', 'ZAR', 'CAD', 'CHF', 'BRL'] ['USD', 'EUR', 'GBP', 'AUD', 'HKD', 'SGD', 'ZAR', 'CAD', 'CHF', 'BRL'] 7.57339e+09 7.57339e+09 2.12004e+09 2.12004e+09 5.61716e+09 5.61716e+09 3.46104e+09 3.46104e+09 43 43 -21.1298 -21.1298 -58.9183 -58.9183 4.71841e+06 4.71841e+06 722 722 466 466 326 326 2019-02-25T00:00:00.000Z 2019-02-25T00:00:00.000Z 2022-05-17T20:08:11.000Z 2022-05-17T20:08:11.000Z 0.0003 0.0003 0.02 0.02 0.07 0.07 2 2
2 2 89 89 Coinbase Exchange Coinbase 交易所 coinbase-exchange币基交易所 8.303 8.303 [] [] ['USD', 'EUR', 'GBP'] ['美元','欧元','英镑'] 1.80697e+09 1.80697e+09 1.80757e+09 1.80757e+09 nan nan nan -13.3741 -13.3741 -68.7096 -68.7096 2.19108e+06 2.19108e+06 717 717 503 503 173 173 2014-05-24T00:00:00.000Z 2014-05-24T00:00:00.000Z 2022-05-17T20:08:11.000Z 2022-05-17T20:08:11.000Z 0.0003 0.0003 0 0 0 0 3 3
3 3 24 24 Kraken海妖 kraken海妖 7.9853 7.9853 [] [] ['USD', 'EUR', 'GBP', 'CAD', 'JPY', 'CHF', 'AUD'] ['USD', 'EUR', 'GBP', 'CAD', 'JPY', 'CHF', 'AUD'] 8.10391e+08 8.10391e+08 7.66352e+08 7.66352e+08 2.74902e+11 2.74902e+11 4.01852e+07 4.01852e+07 28 28 -14.7475 -14.7475 -63.5845 -63.5845 1.72099e+06 1.72099e+06 739 739 542 542 167 167 2011-07-28T00:00:00.000Z 2011-07-28T00:00:00.000Z 2022-05-17T20:08:11.000Z 2022-05-17T20:08:11.000Z 0.0001 0.0001 0.02 0.02 0.05 0.05 4 4
4 4 311 311 KuCoin库币 kucoin库币 7.486 7.486 [] [] ['USD', 'AED', 'ARS', 'AUD', 'AGN', 'BGN', 'BRL', 'CAD', 'CHF', 'CLP', 'COP', 'CRC', 'CZK', 'DKK', 'DOP', 'EUR', 'GBP', 'GEL', 'HKD', 'HUF', 'ILS', 'INR', 'JPY', 'KRW', 'KZT', 'MAD', 'MDL', 'MXN', 'MYR', 'NAD', 'NGN', 'NOK', 'NZD', 'PEN', 'PHP', 'PLN', 'QAR', 'RON', 'RUB', 'SEK', 'SGD', 'TRY', 'TWD', 'UAH', 'USD', 'UYU', 'UZS', 'ZAR'] ['USD'、'AED'、'ARS'、'AUD'、'AGN'、'BGN'、'BRL'、'CAD'、'CHF'、'CLP'、'COP'、'CRC'、' CZK'、'DKK'、'DOP'、'EUR'、'GBP'、'GEL'、'HKD'、'HUF'、'ILS'、'INR'、'JPY'、'KRW'、'KZT' 、'MAD'、'MDL'、'MXN'、'MYR'、'NAD'、'NGN'、'NOK'、'NZD'、'PEN'、'PHP'、'PLN'、'QAR'、' RON'、'RUB'、'SEK'、'SGD'、'TRY'、'TWD'、'UAH'、'USD'、'UYU'、'UZS'、'ZAR'] 5.17875e+09 5.17875e+09 1.58063e+09 1.58063e+09 3.61257e+09 3.61257e+09 9.08548e+08 9.08548e+08 112 112 -12.0398 -12.0398 -62.4081 -62.4081 2.55465e+06 2.55465e+06 547 547 1291 1291 696 696 2017-08-13T00:00:00.000Z 2017-08-13T00:00:00.000Z 2022-05-17T20:08:11.000Z 2022-05-17T20:08:11.000Z 0.0002 0.0002 0 0 0 0 5 5

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM