簡體   English   中英

Pandas - 只能將大小為 1 的數組轉換為 Python 標量

[英]Pandas - can only convert an array of size 1 to a Python scalar

我有兩個數據框:

df_melt

    MatchID GameWeek        Date                      Team  Home               AgainstTeam
0     46605        1  2019-08-09                 Liverpool  Home              Norwich City
1     46605        1  2019-08-09              Norwich City  Away                 Liverpool
2     46606        1  2019-08-10           AFC Bournemouth  Home          Sheffield United
3     46606        1  2019-08-10          Sheffield United  Away           AFC Bournemouth
4     46607        1  2019-08-10                   Burnley  Home               Southampton
..      ...      ...         ...                       ...   ...                       ...
533   46871       27  2020-02-23                   Watford  Away         Manchester United
534   46872       27  2020-02-22          Sheffield United  Home  Brighton and Hove Albion
535   46872       27  2020-02-22  Brighton and Hove Albion  Away          Sheffield United
536   46873       27  2020-02-22               Southampton  Home               Aston Villa
537   46873       27  2020-02-22               Aston Villa  Away               Southampton

並且,對於球員比賽, df_pm

                                       Player  GameWeek  Minutes  ... CloseShotCreated TotalShotCreated  HeadersCreated
PlayerMatchesDetailID                                             ...                                                  
1                                     Alisson         1       90  ...                0                0               0
2                             Virgil van Dijk         1       90  ...                0                0               0
3                                Joseph Gomez         1       90  ...                0                1               0
4                            Andrew Robertson         1       90  ...                0                1               0
5                      Trent Alexander-Arnold         1       90  ...                3                3               1
...                                       ...       ...      ...  ...              ...              ...             ...
15053                             Matty James        22        0  ...                0                0               0
15054                             Matty James        23        0  ...                0                0               0
15055                             Matty James        24        0  ...                0                0               0
15056                             Matty James        25        0  ...                0                0               0
15057                             Matty James        26        0  ...                0                0               0

現在,我正在嘗試遍歷df_pm並在df_melt給定某些條件下查找項目,如下所示:

#Instantiate an empty list
match_ids = []
home_away = []
dates = []

#For each row in the player matches dataframe...
for row in df_pm.itertuples():
    #Look up the match id from the team matches dataframe
    team = row.ForTeam
    againstteam = row.AgainstTeam
    gameweek = row.GameWeek

    match_id = df_melt.loc[(df_melt['GameWeek']==gameweek)
                          &(df_melt['Team']==team)
                          &(df_melt['AgainstTeam']==againstteam),
                          'MatchID'].item()

    date = df_melt.loc[(df_melt['GameWeek']==gameweek)
                          &(df_melt['Team']==team)
                          &(df_melt['AgainstTeam']==againstteam),
                          'Date'].item()

    home = df_melt.loc[(df_melt['GameWeek']==gameweek)
                          &(df_melt['Team']==team)
                          &(df_melt['AgainstTeam']==againstteam),
                          'Home'].item()

    #Add it to the list
    match_ids.append(match_id)
    home_away.append(home)
    dates.append(date)

但是對於所有迭代,即使我打印“團隊”、“對抗團隊”和“游戲周”,我也會收到以下錯誤:

Traceback (most recent call last):
  File "tableau_data_generation.py", line 155, in <module>
    'MatchID'].item()
  File "/Users/me/anaconda2/envs/data_science/lib/python3.7/site-packages/pandas/core/base.py", line 652, in item
    return self.values.item()
ValueError: can only convert an array of size 1 to a Python scalar

...表明該項目不存在。

但是當我打印完整的 dataframe df_melt ,如下所示:

with pd.option_context('display.max_rows', None, 'display.max_columns', None):  # more options can be specified also
    print(df_melt, df_melt.shape)

我得到(538, 6)並且可以看到所有數據都在那里,沒有任何缺陷。


當我檢查類型時,我看到:

df_melt

MatchID        object
GameWeek       object
Date           object
Team           object
Home           object
AgainstTeam    object

df_pm

Player                 object
GameWeek                int64
Minutes                 int64
ForTeam                object
AgainstTeam            object
Goals                   int64
ShotsOnTarget           int64
ShotsInBox              int64
CloseShots              int64
TotalShots              int64
Headers                 int64
GoalAssists             int64
ShotOnTargetCreated     int64
ShotInBoxCreated        int64
CloseShotCreated        int64
TotalShotCreated        int64
HeadersCreated          int64

所以這里的類型不匹配。


如果我在執行迭代之前添加以下代碼行:

df_melt['GameWeek'] = pd.to_numeric(df_melt['GameWeek'])

我成功地為df_pm.itertuples()中的第一行打印了數十個“match_id”、“日期”和“家”(在我添加該行之前沒有打印),只是在第二行再次中斷錯誤:

ValueError: can only convert an array of size 1 to a Python scalar

我該如何解決?


注意:這是上面代碼之后的內容。

def matchid_lookup(player, date, team, gameweek):
    try:
        try:
            return df_pm.loc[(df_pm['Date']==date)
                        &(df_pm['Player']==player), 'MatchID'].item()
        except:
            return df_pm.loc[(df_pm['Date']==date)
                        &(df_pm['ForTeam']==team), 'MatchID'].iloc[0]
    except:
        return df_pm.loc[(df_pm['GameWeek']==gameweek)
                        &(df_pm['Player']==player), 'MatchID'].item()

#Declare the list as a column in the player matches df
df_pm['MatchID']=match_ids
df_pm['Date']=pd.to_datetime(dates)
df_pm['Home']=home_away
df_pm['Position']=df_pm['Player'].map(pos_lookup)

#Get the match IDs column first in the dataframe
cols = list(df_pm.columns)
new_cols = ['MatchID', 'Date', 'Home','Position'] + cols[:-4]
df_pm = df_pm[new_cols]

#Bring in stats from api table
#First, get key identifiers into the api table to facilitate joining
df_api_stats['Player'] = df_api_stats['PlayerID'].map(player_lookup)
df_api_stats['Team'] = df_api_stats['PlayerID'].map(team_lookup)    
df_api_stats['MatchID'] = df_api_stats.apply(lambda x: matchid_lookup(x['Player'],
                                                                      x['Date'],
                                                                      x['Team'],
                                                                      x['GameWeek']), axis=1)
api_cols = ['Player', 'MatchID', 'BPS', 'MinutesPlayed',
            'CleanSheet', 'Saves', 'NetTransfersIn',
            'SelectedBy', 'Points', 'Price']

df_api_cols = df_api_stats[api_cols]

所以有一些來自df_api_stats的日期不在df_pm中,您可以通過以下方式查看:

print (set(pd.to_datetime(df_api_stats['Date'])) - set(pd.to_datetime(df_pm['Date'])))
{Timestamp('2020-01-29 00:00:00'),
 Timestamp('2020-02-28 00:00:00'),
 Timestamp('2020-02-29 00:00:00'),
 Timestamp('2020-03-01 00:00:00'),
 Timestamp('2020-03-07 00:00:00'),
 Timestamp('2020-03-08 00:00:00'),
 Timestamp('2020-03-09 00:00:00')}

我不確定你想對缺失值做什么,但為了避免方法失敗,你可以添加一個 except 並在沒有任何可能匹配的情況下返回 nan。

def matchid_lookup(player, date, team, gameweek):
    try:
        try:
            return df_pm.loc[(df_pm['Date']==date)
                        &(df_pm['Player']==player), 'MatchID'].item()
        except:
            return df_pm.loc[(df_pm['Date']==date)
                        &(df_pm['ForTeam']==team), 'MatchID'].iloc[0]
    except:
        try:
            return df_pm.loc[(df_pm['GameWeek']==gameweek)
                            &(df_pm['Player']==player), 'MatchID'].item()
        except:
            return np.nan

注意:就在之前導致問題的for循環之前,不要忘記這樣做:

df_melt['GameWeek'] = pd.to_numeric(df_melt['GameWeek'])
df_melt[['Team', 'AgainstTeam']] = df_melt[['Team', 'AgainstTeam']]\
                                          .replace('AFC Bournemouth', 'Bournemouth')
 return self.values.item()
 ValueError: can only convert an array of size 1 to a Python scalar

上面的錯誤是說你有一個包含多個元素的數組。 為了能夠使用.item(),您應該只有一個值,以便它可以從數組轉換為標量。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM