简体   繁体   中英

Pandas can only convert an array of size 1 to a Python scalar

I have this dataframe, df_pm :

                             Player  GameWeek  Minutes  \
PlayerMatchesDetailID                                                 
1             Alisson         1       90   
2     Virgil van Dijk         1       90   
3        Joseph Gomez         1       90 

                             ForTeam               AgainstTeam  \
1                             Liverpool              Norwich City   
2                             Liverpool              Norwich City   
3                             Liverpool              Norwich City  

                             Goals  ShotsOnTarget  ShotsInBox  CloseShots  \
1                             0              0           0           0   
2                             1              1           1           1   
3                             0              0           0           0 
                     TotalShots  Headers  GoalAssists  ShotOnTargetCreated  \
1                             0        0            0                    0   
2                             1        1            0                    0   
3                             0        0            0                    0   
                       ShotInBoxCreated  CloseShotCreated  TotalShotCreated  \
1                             0                 0                 0   
2                             0                 0                 0   
3                             0                 0                 1  
                         HeadersCreated  
1                             0  
2                             0  
3                             0 

this second dataframe, df_melt :

    MatchID GameWeek        Date                      Team  Home  \
0     46605        1  2019-08-09                 Liverpool  Home   
1     46605        1  2019-08-09              Norwich City  Away   
2     46606        1  2019-08-10           AFC Bournemouth  Home  

                  AgainstTeam  
0                Norwich City  
1                   Liverpool  
2            Sheffield United  
3             AFC Bournemouth  
...
575          Sheffield United  
576          Newcastle United  
577               Southampton

and this snippet, which uses both:

match_ids = []
home_away = []
dates = []

#For each row in the player matches dataframe...
for row in df_pm.itertuples():
    #Look up the match id from the team matches dataframe
    team = row.ForTeam
    againstteam = row.AgainstTeam
    gameweek = row.GameWeek
    print (team,againstteam,gameweek)

    match_id = df_melt.loc[(df_melt['GameWeek']==gameweek)
                          &(df_melt['Team']==team)
                          &(df_melt['AgainstTeam']==againstteam),
                          'MatchID'].item()

    date = df_melt.loc[(df_melt['GameWeek']==gameweek)
                          &(df_melt['Team']==team)
                          &(df_melt['AgainstTeam']==againstteam),
                          'Date'].item()

    home = df_melt.loc[(df_melt['GameWeek']==gameweek)
                          &(df_melt['Team']==team)
                          &(df_melt['AgainstTeam']==againstteam),
                          'Home'].item()

    match_ids.append(match_id)
    home_away.append(home)
    dates.append(date)

At first iteration, I print:

Liverpool
Norwich City
1

But I'm getting the error:

Traceback (most recent call last):
  File "tableau_data_generation.py", line 166, in <module>
    'MatchID'].item()
  File "/Users/me/anaconda2/envs/data_science/lib/python3.7/site-packages/pandas/core/base.py", line 652, in item
    return self.values.item()
ValueError: can only convert an array of size 1 to a Python scalar

printing the whole df_melt dataframe, I see that these four datetime values are flawed:

540   46875       28         TBC               Aston Villa  Home   
541   46875       28         TBC          Sheffield United  Away   
...
548   46879       28         TBC           Manchester City  Home   
549   46879       28         TBC                   Arsenal  Away  

How do I fix this?

When you use item() on a Series you should actually have received:

FutureWarning: `item` has been deprecated and will be removed in a future version

Since item() has been deprecated in version 0.25.0 , it looks like you use some outdated version of Pandas and possibly you should start from upgrading it.

Even in a newer version of Pandas you can use item() , but on a Numpy array (at least now, not deprecated). So change your code to:

df_melt.loc[...].values.item()

Another option is to use iloc[0] , so you can also change your code to:

df_melt.loc[...].iloc[0]

Edit

The above solution still can raise an exception ( IndexError ) if df_melt does not find any row meeting the given criteria.

To make your code resistant to such cases (and return some default value) you can add a function getting the given attribute ( attr , actually a column) from the first row meeting the criteria given ( gameweek , team , and againstteam ):

def getAttr(gameweek, team, againstteam, attr, default=None):
    xx = df_melt.loc[(df_melt['GameWeek'] == gameweek)
                   & (df_melt['Team'] == team)
                   & (df_melt['AgainstTeam'] == againstteam)]
    return default if xx.empty else xx.iloc[0].loc[attr]

Then, instead of all 3 ... = df_melt.loc[...].item() instructions run:

match_id = getAttr(gameweek, team, againstteam, 'MatchID', default=-1)
date     = getAttr(gameweek, team, againstteam, 'Date')
home     = getAttr(gameweek, team, againstteam, 'Home', default='????')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM