繁体   English   中英

Pandas 和 bs4 跳过刮表中的超链接

[英]Pandas and bs4 skip hyperlink in scraped table

我试图使用 Pandas 和 bs4 从 MTG 金鱼上刮一张桌子。长期目标是给自己发短信给我移动者和振动器列表,但我得到了 5 列中的 4 列,但它跳过并给出了一个奇怪的结果。关联。 我想要的只是超链接的显示名称,以便我可以将其作为表格阅读

import requests
from bs4 import BeautifulSoup as bs
import pandas as pd


response = requests.get("https://www.mtggoldfish.com/movers/paper/standard")
soup = bs(response.text, "html.parser")


table = soup.find_all('table')
df = pd.read_html(str(table))[0]
print(df)

输出是这个

 Top Winners Top Winners.1  ... Top Winners.3 Top Winners.4
0         5.49          xznr  ...       $ 16.00          +52%
1         0.96           thb  ...       $ 18.99           +5%
2         0.63          xznr  ...        $ 5.46          +13%
3         0.49           m21  ...        $ 4.99          +11%
4         0.41          xznr  ...        $ 4.45          +10%
5         0.32          xznr  ...       $ 17.10           +2%
6         0.25          xznr  ...        $ 0.71          +54%
7         0.25          xznr  ...        $ 0.67          +60%
8         0.15           eld  ...       $ 18.70           +1%
9         0.12           thb  ...       $ 11.87           +1%

第 3 列是附加到站点上卡片页面超链接的卡片名称。 我不知道如何将所有内容提取在一起。

只需调用.to_string()

import requests
from bs4 import BeautifulSoup as bs
import pandas as pd

response = requests.get("https://www.mtggoldfish.com/movers/paper/standard")
soup = bs(response.text, "html.parser")

table = soup.find_all("table")

df = pd.read_html(str(table))[0]
print(df.to_string())

输出:

   Top Winners Top Winners.1                   Top Winners.2 Top Winners.3 Top Winners.4
0         0.96           thb  Kroxa, Titan of Death's Hunger       $ 18.99           +5%
1         0.63          xznr              Clearwater Pathway        $ 5.46          +13%
2         0.49           m21         Thieves' Guild Enforcer        $ 4.99          +11%
3         0.41          xznr             Skyclave Apparition        $ 4.45          +10%
4         0.32          xznr        Scourge of the Skyclaves       $ 17.10           +2%
5         0.25          xznr                 Malakir Rebirth        $ 0.71          +54%
6         0.25          xznr                Blackbloom Rogue        $ 0.67          +60%
7         0.16          xznr                 Zof Consumption        $ 0.63          +34%
8         0.15           eld            Oko, Thief of Crowns       $ 18.70           +1%
9         0.12           thb             Heliod, Sun-Crowned       $ 11.87           +1%

您可以将 html 表直接读取到 Pandas。 风味可以设置为“html.parser”,但“lxml”更快。

import pandas as pd

tables = pd.read_html("https://www.mtggoldfish.com/movers/paper/standard", flavor='lxml')

# Daily Change
daily_winners = tables[0]
daily_lossers = tables[1]

# Weekly Change
weekly_winners = table[2]
weekly_lossers = tablke[3]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM