简体   繁体   中英

Pandas ValueError: can only convert an array of size 1 to a Python scalar

With the following code:

#Bring in the 'player matches' dataframe
df_pm = sql('select * from PlayerMatchesDetail', c).drop('TableIndex', axis=1)
df_pm['GoalInv'] = df_pm['Goals']+df_pm['GoalAssists']

df_pm.head(3) # THIS PRINTS FINE (see below)

# We need to associate a match ID to each row here, so that we can groupby properly.    
def MatchIDLookup(gw, ht, at):
    '''
    Takes a gameweek, hometeam, and awayteam,
    and returns the matchID of the game
    '''
    return int(df_fixtures.loc[(df_fixtures['GameWeek']==gw)
                  &(((df_fixtures['HomeTeam']==ht)
                     &(df_fixtures['AwayTeam']==at))
                   |((df_fixtures['HomeTeam']==at)
                     &(df_fixtures['AwayTeam']==ht))),'MatchID'].item())

#Apply the function to insert the matchID
df_pm['MatchID'] = df_pm.apply(lambda x: MatchIDLookup(x['GameWeek'],
                                                       x['ForTeam'],
                                                       x['AgainstTeam']), axis=1)

#Create a multi-index
df_pm.set_index(['MatchID','Player'], inplace=True)

#We now create columns in the player match dataframe, describing their expected goals, assists, and goal involvement.

#Goals
df_pm['XG'] = df.groupby(['MatchID','Player']).sum()[['XG']]
#Assists
df_pm['XA'] = df.groupby(['MatchID','AssistedBy']).sum()[['XG']]

#Fill NAs with 0s
df_pm.fillna(0, inplace=True)

#Calculate goal Involvement
df_pm['XGI'] = df_pm['XG'] + df_pm['XA']

# Let's see how player gameweeks are distributed...
plt.figure(figsize=(10,3))
plt.hist(df_pm['XG'], label='XG', bins=30)

plt.xlim(0)
plt.ylim(0,1000)
plt.title('Distribution of player XG in each match')

plt.figure(figsize=(10,3))
plt.hist(df_pm['XA'], label='XGA', bins=30, color=color_list[1])

plt.xlim(0)
plt.ylim(0,1000)
plt.title('Distribution of player XA in each match')

plt.figure(figsize=(10,3))
plt.hist(df_pm['XGI'], label='XGI', bins=30, color=color_list[2])

plt.xlim(0)
plt.ylim(0,1000)
plt.title('Distribution of player XGI in each match');
plt.show()

I am getting the following traceback:

Traceback (most recent call last):
  File "expected_goals.py", line 365, in <module>
    x['AgainstTeam']), axis=1)
  File "/Users/me/anaconda2/envs/data_science/lib/python3.7/site-packages/pandas/core/frame.py", line 6878, in apply
    return op.get_result()
  File "/Users/me/anaconda2/envs/data_science/lib/python3.7/site-packages/pandas/core/apply.py", line 186, in get_result
    return self.apply_standard()
  File "/Users/me/anaconda2/envs/data_science/lib/python3.7/site-packages/pandas/core/apply.py", line 296, in apply_standard
    values, self.f, axis=self.axis, dummy=dummy, labels=labels
  File "pandas/_libs/reduction.pyx", line 620, in pandas._libs.reduction.compute_reduction
  File "pandas/_libs/reduction.pyx", line 128, in pandas._libs.reduction.Reducer.get_result
  File "expected_goals.py", line 365, in <lambda>
    x['AgainstTeam']), axis=1)
  File "expected_goals.py", line 360, in MatchIDLookup
    &(df_fixtures['AwayTeam']==ht))),'MatchID'].item())
  File "/Users/me/anaconda2/envs/data_science/lib/python3.7/site-packages/pandas/core/base.py", line 652, in item
    return self.values.item()
ValueError: can only convert an array of size 1 to a Python scalar

Notes:

df.fixtures prints fine:

                 MatchID  GameWeek       Date        HomeTeam                 AwayTeam
FixturesBasicID                                                                      
1                 46605         1 2019-08-09       Liverpool             Norwich City
2                 46606         1 2019-08-10     Bournemouth         Sheffield United
3                 46607         1 2019-08-10         Burnley              Southampton
4                 46608         1 2019-08-10  Crystal Palace                  Everton
5                 46609         1 2019-08-11  Leicester City  Wolverhampton Wanderers

And, before using MatchIDLookup() , df_pm.head(3) also prints fine:

                                Player  GameWeek  Minutes    ForTeam  ... CreatedCentre  CreatedLeft  CreatedRight  GoalInv
PlayerMatchesDetailID                                                 ...                                                  
1                              Alisson         1       90  Liverpool  ...             0            0             0        0
2                      Virgil van Dijk         1       90  Liverpool  ...             0            0             0        1
3                         Joseph Gomez         1       90  Liverpool  ...             0            0             0        0

How do I fix this?

Without trying it out I believe the issue is the int() in the return of MatchIDLookup() function. Pandas usually doesn't allow this. Instead, return the value without conversion to int and then add below:

df_pm['MatchID'] = df_pm['MatchID'].astype(int)

PS Also, I would generally advise against converting any type of IDs to integers but keeping it as strings - simple reason if an id starts with zero (0654 or 0012) by converting it to integer you will lose the 4 digit format.

EDIT:

def MatchIDLookup(gw, ht, at):

    res = df_fixtures.loc[(df_fixtures['GameWeek']==gw)
                  &(((df_fixtures['HomeTeam']==ht)
                     &(df_fixtures['AwayTeam']==at))
                   |((df_fixtures['HomeTeam']==at)
                     &(df_fixtures['AwayTeam']==ht))),'MatchID']

    return res.item() if len(res) > 0 else 'not found' ```

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM