簡體   English   中英

Pandas 只能將大小為 1 的數組轉換為 Python 標量

[英]Pandas can only convert an array of size 1 to a Python scalar

我有這個 dataframe, df_pm

                             Player  GameWeek  Minutes  \
PlayerMatchesDetailID                                                 
1             Alisson         1       90   
2     Virgil van Dijk         1       90   
3        Joseph Gomez         1       90 

                             ForTeam               AgainstTeam  \
1                             Liverpool              Norwich City   
2                             Liverpool              Norwich City   
3                             Liverpool              Norwich City  

                             Goals  ShotsOnTarget  ShotsInBox  CloseShots  \
1                             0              0           0           0   
2                             1              1           1           1   
3                             0              0           0           0 
                     TotalShots  Headers  GoalAssists  ShotOnTargetCreated  \
1                             0        0            0                    0   
2                             1        1            0                    0   
3                             0        0            0                    0   
                       ShotInBoxCreated  CloseShotCreated  TotalShotCreated  \
1                             0                 0                 0   
2                             0                 0                 0   
3                             0                 0                 1  
                         HeadersCreated  
1                             0  
2                             0  
3                             0 

這第二個 dataframe, df_melt

    MatchID GameWeek        Date                      Team  Home  \
0     46605        1  2019-08-09                 Liverpool  Home   
1     46605        1  2019-08-09              Norwich City  Away   
2     46606        1  2019-08-10           AFC Bournemouth  Home  

                  AgainstTeam  
0                Norwich City  
1                   Liverpool  
2            Sheffield United  
3             AFC Bournemouth  
...
575          Sheffield United  
576          Newcastle United  
577               Southampton

和這個片段,它同時使用:

match_ids = []
home_away = []
dates = []

#For each row in the player matches dataframe...
for row in df_pm.itertuples():
    #Look up the match id from the team matches dataframe
    team = row.ForTeam
    againstteam = row.AgainstTeam
    gameweek = row.GameWeek
    print (team,againstteam,gameweek)

    match_id = df_melt.loc[(df_melt['GameWeek']==gameweek)
                          &(df_melt['Team']==team)
                          &(df_melt['AgainstTeam']==againstteam),
                          'MatchID'].item()

    date = df_melt.loc[(df_melt['GameWeek']==gameweek)
                          &(df_melt['Team']==team)
                          &(df_melt['AgainstTeam']==againstteam),
                          'Date'].item()

    home = df_melt.loc[(df_melt['GameWeek']==gameweek)
                          &(df_melt['Team']==team)
                          &(df_melt['AgainstTeam']==againstteam),
                          'Home'].item()

    match_ids.append(match_id)
    home_away.append(home)
    dates.append(date)

在第一次迭代中,我打印:

Liverpool
Norwich City
1

但我得到了錯誤:

Traceback (most recent call last):
  File "tableau_data_generation.py", line 166, in <module>
    'MatchID'].item()
  File "/Users/me/anaconda2/envs/data_science/lib/python3.7/site-packages/pandas/core/base.py", line 652, in item
    return self.values.item()
ValueError: can only convert an array of size 1 to a Python scalar

打印整個df_melt dataframe,我發現這四個日期時間值有缺陷:

540   46875       28         TBC               Aston Villa  Home   
541   46875       28         TBC          Sheffield United  Away   
...
548   46879       28         TBC           Manchester City  Home   
549   46879       28         TBC                   Arsenal  Away  

我該如何解決?

當您在系列上使用item()時,您實際上應該收到:

FutureWarning: `item` has been deprecated and will be removed in a future version

由於item()0.25.0版本中已被棄用,看起來您使用了一些過時的Pandas版本,可能您應該從升級它開始。

即使在較新版本的Pandas中,您也可以使用item() ,但在Numpy陣列上(至少現在,未棄用)。 因此,將您的代碼更改為:

df_melt.loc[...].values.item()

另一種選擇是使用iloc[0] ,因此您還可以將代碼更改為:

df_melt.loc[...].iloc[0]

編輯

如果df_melt沒有找到任何符合給定條件的行,上述解決方案仍然可以引發異常 ( IndexError )。

為了使您的代碼能夠抵抗這種情況(並返回一些默認值),您可以添加一個 function 從滿足給定標准( gameweekteamagainstteam )的第一行獲取給定屬性( attr ,實際上是一列):

def getAttr(gameweek, team, againstteam, attr, default=None):
    xx = df_melt.loc[(df_melt['GameWeek'] == gameweek)
                   & (df_melt['Team'] == team)
                   & (df_melt['AgainstTeam'] == againstteam)]
    return default if xx.empty else xx.iloc[0].loc[attr]

然后,而不是所有 3 ... = df_melt.loc[...].item()指令運行:

match_id = getAttr(gameweek, team, againstteam, 'MatchID', default=-1)
date     = getAttr(gameweek, team, againstteam, 'Date')
home     = getAttr(gameweek, team, againstteam, 'Home', default='????')

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM