[英]Pandas ValueError: can only convert an array of size 1 to a Python scalar
With the following code:使用以下代码:
#Bring in the 'player matches' dataframe
df_pm = sql('select * from PlayerMatchesDetail', c).drop('TableIndex', axis=1)
df_pm['GoalInv'] = df_pm['Goals']+df_pm['GoalAssists']
df_pm.head(3) # THIS PRINTS FINE (see below)
# We need to associate a match ID to each row here, so that we can groupby properly.
def MatchIDLookup(gw, ht, at):
'''
Takes a gameweek, hometeam, and awayteam,
and returns the matchID of the game
'''
return int(df_fixtures.loc[(df_fixtures['GameWeek']==gw)
&(((df_fixtures['HomeTeam']==ht)
&(df_fixtures['AwayTeam']==at))
|((df_fixtures['HomeTeam']==at)
&(df_fixtures['AwayTeam']==ht))),'MatchID'].item())
#Apply the function to insert the matchID
df_pm['MatchID'] = df_pm.apply(lambda x: MatchIDLookup(x['GameWeek'],
x['ForTeam'],
x['AgainstTeam']), axis=1)
#Create a multi-index
df_pm.set_index(['MatchID','Player'], inplace=True)
#We now create columns in the player match dataframe, describing their expected goals, assists, and goal involvement.
#Goals
df_pm['XG'] = df.groupby(['MatchID','Player']).sum()[['XG']]
#Assists
df_pm['XA'] = df.groupby(['MatchID','AssistedBy']).sum()[['XG']]
#Fill NAs with 0s
df_pm.fillna(0, inplace=True)
#Calculate goal Involvement
df_pm['XGI'] = df_pm['XG'] + df_pm['XA']
# Let's see how player gameweeks are distributed...
plt.figure(figsize=(10,3))
plt.hist(df_pm['XG'], label='XG', bins=30)
plt.xlim(0)
plt.ylim(0,1000)
plt.title('Distribution of player XG in each match')
plt.figure(figsize=(10,3))
plt.hist(df_pm['XA'], label='XGA', bins=30, color=color_list[1])
plt.xlim(0)
plt.ylim(0,1000)
plt.title('Distribution of player XA in each match')
plt.figure(figsize=(10,3))
plt.hist(df_pm['XGI'], label='XGI', bins=30, color=color_list[2])
plt.xlim(0)
plt.ylim(0,1000)
plt.title('Distribution of player XGI in each match');
plt.show()
I am getting the following traceback:我得到以下回溯:
Traceback (most recent call last):
File "expected_goals.py", line 365, in <module>
x['AgainstTeam']), axis=1)
File "/Users/me/anaconda2/envs/data_science/lib/python3.7/site-packages/pandas/core/frame.py", line 6878, in apply
return op.get_result()
File "/Users/me/anaconda2/envs/data_science/lib/python3.7/site-packages/pandas/core/apply.py", line 186, in get_result
return self.apply_standard()
File "/Users/me/anaconda2/envs/data_science/lib/python3.7/site-packages/pandas/core/apply.py", line 296, in apply_standard
values, self.f, axis=self.axis, dummy=dummy, labels=labels
File "pandas/_libs/reduction.pyx", line 620, in pandas._libs.reduction.compute_reduction
File "pandas/_libs/reduction.pyx", line 128, in pandas._libs.reduction.Reducer.get_result
File "expected_goals.py", line 365, in <lambda>
x['AgainstTeam']), axis=1)
File "expected_goals.py", line 360, in MatchIDLookup
&(df_fixtures['AwayTeam']==ht))),'MatchID'].item())
File "/Users/me/anaconda2/envs/data_science/lib/python3.7/site-packages/pandas/core/base.py", line 652, in item
return self.values.item()
ValueError: can only convert an array of size 1 to a Python scalar
Notes:笔记:
df.fixtures
prints fine: df.fixtures
打印良好:
MatchID GameWeek Date HomeTeam AwayTeam
FixturesBasicID
1 46605 1 2019-08-09 Liverpool Norwich City
2 46606 1 2019-08-10 Bournemouth Sheffield United
3 46607 1 2019-08-10 Burnley Southampton
4 46608 1 2019-08-10 Crystal Palace Everton
5 46609 1 2019-08-11 Leicester City Wolverhampton Wanderers
And, before using MatchIDLookup()
, df_pm.head(3)
also prints fine:而且,在使用
MatchIDLookup()
之前, df_pm.head(3)
也可以正常打印:
Player GameWeek Minutes ForTeam ... CreatedCentre CreatedLeft CreatedRight GoalInv
PlayerMatchesDetailID ...
1 Alisson 1 90 Liverpool ... 0 0 0 0
2 Virgil van Dijk 1 90 Liverpool ... 0 0 0 1
3 Joseph Gomez 1 90 Liverpool ... 0 0 0 0
How do I fix this?我该如何解决?
Without trying it out I believe the issue is the int()
in the return of MatchIDLookup()
function.如果不尝试,我相信问题是
MatchIDLookup()
function 返回的int()
)。 Pandas usually doesn't allow this. Pandas 通常不允许这样做。 Instead, return the value without conversion to int and then add below:
相反,返回值而不转换为 int,然后在下面添加:
df_pm['MatchID'] = df_pm['MatchID'].astype(int)
PS Also, I would generally advise against converting any type of IDs to integers but keeping it as strings - simple reason if an id starts with zero (0654 or 0012) by converting it to integer you will lose the 4 digit format. PS 另外,我通常建议不要将任何类型的 ID 转换为整数,但将其保留为字符串 - 如果 ID 以零(0654 或 0012)开头,通过将其转换为 integer,您将失去 4 位格式。
EDIT:编辑:
def MatchIDLookup(gw, ht, at):
res = df_fixtures.loc[(df_fixtures['GameWeek']==gw)
&(((df_fixtures['HomeTeam']==ht)
&(df_fixtures['AwayTeam']==at))
|((df_fixtures['HomeTeam']==at)
&(df_fixtures['AwayTeam']==ht))),'MatchID']
return res.item() if len(res) > 0 else 'not found' ```
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.