如何使用 Python 和 BeautifulSoup 从 html 表中抓取数据？

Question

如果您查看此页面https://metals-api.com/currencies ，则有一个包含 2 列的 html 表。 我想将 column1 中的所有行提取到列表/数组中。 我怎么go一下这个？

import requests
from bs4 import BeautifulSoup

URL = "https://metals-api.com/currencies"
page = requests.get(URL)


soup = BeautifulSoup(page.content, "html.parser")


with open('outpu2t.txt', 'w', encoding='utf-8') as f: 

    f.write(soup.text)

为了澄清，我不希望针对这些代码运行一些获取价格命令，我正在尝试编译一个代码列表，以便我可以将它们添加到我的应用程序的下拉菜单中

Answer 1

如果我理解这个问题，那么你可以尝试下一个例子

import requests
from bs4 import BeautifulSoup
import pandas as pd
data=[]
URL = "https://metals-api.com/currencies"
page = requests.get(URL)

soup = BeautifulSoup(page.content, "html.parser")
for code in soup.select('.table tbody tr td:nth-child(1)'):
    code =code.text
    data.append(code)
df=pd.DataFrame(data,columns=['code'])
#df.to_csv('code.csv',index=False)# to store data
print(df)

Output：

     code
0     XAU
1     XAG
2     XPT
3     XPD
4     XCU
..    ...
209  LINK
210   XLM
211   ADA
212   BCH
213   LTC

[214 rows x 1 columns]

Answer 2

我纠正了，我最初只是尝试pd.read_html("https://metals-api.com/currencies") ，它通常可以正常工作，但显然只要稍加改动它仍然可以正常工作。

import pandas as pd
import requests
URL = "https://metals-api.com/currencies"
page = requests.get(URL)
df = pd.read_html(page.content)[0]
print(df)

Output：

     Code                                               Name
0     XAU  1 Ounce of 24K Gold. Use Carat endpoint to dis...
1     XAG                                             Silver
2     XPT                                           Platinum
3     XPD                                          Palladium
4     XCU                                             Copper
..    ...                                                ...
209  LINK                                          Chainlink
210   XLM                                            Stellar
211   ADA                                            Cardano
212   BCH                                       Bitcoin Cash
213   LTC                                           Litecoin

[214 rows x 2 columns]

如何使用 Python 和 BeautifulSoup 从 html 表中抓取数据？

问题描述

2 个解决方案

解决方案1
1 已采纳 2022-04-14 19:12:43

解决方案2
1 2022-04-14 19:43:31

如何使用 Python 和 BeautifulSoup 从 html 表中抓取数据？

问题描述

2 个解决方案

解决方案1 1 已采纳 2022-04-14 19:12:43

解决方案2 1 2022-04-14 19:43:31

解决方案1
1 已采纳 2022-04-14 19:12:43

解决方案2
1 2022-04-14 19:43:31