繁体   English   中英

Pandas 只能将大小为 1 的数组转换为 Python 标量

[英]Pandas can only convert an array of size 1 to a Python scalar

我有这个 dataframe, df_pm

                             Player  GameWeek  Minutes  \
PlayerMatchesDetailID                                                 
1             Alisson         1       90   
2     Virgil van Dijk         1       90   
3        Joseph Gomez         1       90 

                             ForTeam               AgainstTeam  \
1                             Liverpool              Norwich City   
2                             Liverpool              Norwich City   
3                             Liverpool              Norwich City  

                             Goals  ShotsOnTarget  ShotsInBox  CloseShots  \
1                             0              0           0           0   
2                             1              1           1           1   
3                             0              0           0           0 
                     TotalShots  Headers  GoalAssists  ShotOnTargetCreated  \
1                             0        0            0                    0   
2                             1        1            0                    0   
3                             0        0            0                    0   
                       ShotInBoxCreated  CloseShotCreated  TotalShotCreated  \
1                             0                 0                 0   
2                             0                 0                 0   
3                             0                 0                 1  
                         HeadersCreated  
1                             0  
2                             0  
3                             0 

这第二个 dataframe, df_melt

    MatchID GameWeek        Date                      Team  Home  \
0     46605        1  2019-08-09                 Liverpool  Home   
1     46605        1  2019-08-09              Norwich City  Away   
2     46606        1  2019-08-10           AFC Bournemouth  Home  

                  AgainstTeam  
0                Norwich City  
1                   Liverpool  
2            Sheffield United  
3             AFC Bournemouth  
...
575          Sheffield United  
576          Newcastle United  
577               Southampton

和这个片段,它同时使用:

match_ids = []
home_away = []
dates = []

#For each row in the player matches dataframe...
for row in df_pm.itertuples():
    #Look up the match id from the team matches dataframe
    team = row.ForTeam
    againstteam = row.AgainstTeam
    gameweek = row.GameWeek
    print (team,againstteam,gameweek)

    match_id = df_melt.loc[(df_melt['GameWeek']==gameweek)
                          &(df_melt['Team']==team)
                          &(df_melt['AgainstTeam']==againstteam),
                          'MatchID'].item()

    date = df_melt.loc[(df_melt['GameWeek']==gameweek)
                          &(df_melt['Team']==team)
                          &(df_melt['AgainstTeam']==againstteam),
                          'Date'].item()

    home = df_melt.loc[(df_melt['GameWeek']==gameweek)
                          &(df_melt['Team']==team)
                          &(df_melt['AgainstTeam']==againstteam),
                          'Home'].item()

    match_ids.append(match_id)
    home_away.append(home)
    dates.append(date)

在第一次迭代中,我打印:

Liverpool
Norwich City
1

但我得到了错误:

Traceback (most recent call last):
  File "tableau_data_generation.py", line 166, in <module>
    'MatchID'].item()
  File "/Users/me/anaconda2/envs/data_science/lib/python3.7/site-packages/pandas/core/base.py", line 652, in item
    return self.values.item()
ValueError: can only convert an array of size 1 to a Python scalar

打印整个df_melt dataframe,我发现这四个日期时间值有缺陷:

540   46875       28         TBC               Aston Villa  Home   
541   46875       28         TBC          Sheffield United  Away   
...
548   46879       28         TBC           Manchester City  Home   
549   46879       28         TBC                   Arsenal  Away  

我该如何解决?

当您在系列上使用item()时,您实际上应该收到:

FutureWarning: `item` has been deprecated and will be removed in a future version

由于item()0.25.0版本中已被弃用,看起来您使用了一些过时的Pandas版本,可能您应该从升级它开始。

即使在较新版本的Pandas中,您也可以使用item() ,但在Numpy阵列上(至少现在,未弃用)。 因此,将您的代码更改为:

df_melt.loc[...].values.item()

另一种选择是使用iloc[0] ,因此您还可以将代码更改为:

df_melt.loc[...].iloc[0]

编辑

如果df_melt没有找到任何符合给定条件的行,上述解决方案仍然可以引发异常 ( IndexError )。

为了使您的代码能够抵抗这种情况(并返回一些默认值),您可以添加一个 function 从满足给定标准( gameweekteamagainstteam )的第一行获取给定属性( attr ,实际上是一列):

def getAttr(gameweek, team, againstteam, attr, default=None):
    xx = df_melt.loc[(df_melt['GameWeek'] == gameweek)
                   & (df_melt['Team'] == team)
                   & (df_melt['AgainstTeam'] == againstteam)]
    return default if xx.empty else xx.iloc[0].loc[attr]

然后,而不是所有 3 ... = df_melt.loc[...].item()指令运行:

match_id = getAttr(gameweek, team, againstteam, 'MatchID', default=-1)
date     = getAttr(gameweek, team, againstteam, 'Date')
home     = getAttr(gameweek, team, againstteam, 'Home', default='????')

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM