[英]Pandas and bs4 skip hyperlink in scraped table
我试图使用 Pandas 和 bs4 从 MTG 金鱼上刮一张桌子。长期目标是给自己发短信给我移动者和振动器列表,但我得到了 5 列中的 4 列,但它跳过并给出了一个奇怪的结果。关联。 我想要的只是超链接的显示名称,以便我可以将其作为表格阅读
import requests
from bs4 import BeautifulSoup as bs
import pandas as pd
response = requests.get("https://www.mtggoldfish.com/movers/paper/standard")
soup = bs(response.text, "html.parser")
table = soup.find_all('table')
df = pd.read_html(str(table))[0]
print(df)
输出是这个
Top Winners Top Winners.1 ... Top Winners.3 Top Winners.4
0 5.49 xznr ... $ 16.00 +52%
1 0.96 thb ... $ 18.99 +5%
2 0.63 xznr ... $ 5.46 +13%
3 0.49 m21 ... $ 4.99 +11%
4 0.41 xznr ... $ 4.45 +10%
5 0.32 xznr ... $ 17.10 +2%
6 0.25 xznr ... $ 0.71 +54%
7 0.25 xznr ... $ 0.67 +60%
8 0.15 eld ... $ 18.70 +1%
9 0.12 thb ... $ 11.87 +1%
第 3 列是附加到站点上卡片页面超链接的卡片名称。 我不知道如何将所有内容提取在一起。
只需调用.to_string()
:
import requests
from bs4 import BeautifulSoup as bs
import pandas as pd
response = requests.get("https://www.mtggoldfish.com/movers/paper/standard")
soup = bs(response.text, "html.parser")
table = soup.find_all("table")
df = pd.read_html(str(table))[0]
print(df.to_string())
输出:
Top Winners Top Winners.1 Top Winners.2 Top Winners.3 Top Winners.4
0 0.96 thb Kroxa, Titan of Death's Hunger $ 18.99 +5%
1 0.63 xznr Clearwater Pathway $ 5.46 +13%
2 0.49 m21 Thieves' Guild Enforcer $ 4.99 +11%
3 0.41 xznr Skyclave Apparition $ 4.45 +10%
4 0.32 xznr Scourge of the Skyclaves $ 17.10 +2%
5 0.25 xznr Malakir Rebirth $ 0.71 +54%
6 0.25 xznr Blackbloom Rogue $ 0.67 +60%
7 0.16 xznr Zof Consumption $ 0.63 +34%
8 0.15 eld Oko, Thief of Crowns $ 18.70 +1%
9 0.12 thb Heliod, Sun-Crowned $ 11.87 +1%
您可以将 html 表直接读取到 Pandas。 风味可以设置为“html.parser”,但“lxml”更快。
import pandas as pd
tables = pd.read_html("https://www.mtggoldfish.com/movers/paper/standard", flavor='lxml')
# Daily Change
daily_winners = tables[0]
daily_lossers = tables[1]
# Weekly Change
weekly_winners = table[2]
weekly_lossers = tablke[3]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.