簡體   English   中英

從多個列創建一個新的pandas列

[英]Create a new pandas columns from multiple columns

這是數據框

    MatchId EventCodeId EventCode   Team1   Team2   Team1_Goals Team2_Goals xG_Team1    xG_Team2    CurrentPlaytime
0   865314  1029    Goal Home   Northampton Crawley Town    2   2   2.067663207769023   0.8130662505484256  457040
1   865314  1029    Goal Home   Northampton Crawley Town    2   2   2.067663207769023   0.8130662505484256  1405394
2   865314  2053    Goal Away   Northampton Crawley Town    2   2   2.067663207769023   0.8130662505484256  1898705
3   865314  2053    Goal Away   Northampton Crawley Town    2   2   2.067663207769023   0.8130662505484256  4388278
4   865314  1029    Goal Home   Northampton Crawley Town    2   2   2.067663207769023   0.8130662505484256  4507898
5   865314  1030    Cancel Goal Home    Northampton Crawley Town    2   2   2.067663207769023   0.8130662505484256  4517728
6   865314  1029    Goal Home   Northampton Crawley Town    2   2   2.067663207769023   0.8130662505484256  4956346
7   865314  1030    Cancel Goal Home    Northampton Crawley Town    2   2   2.067663207769023   0.8130662505484256  4960633
8   865316  2053    Goal Away   Coventry    Bradford    0   0   1.0847662440468118  1.2526705617472387  447858
9   865316  2054    Cancel Goal Away    Coventry    Bradford    0   0   1.0847662440468118  1.2526705617472387  456361

新列將如下創建:

for EventCodeId = 1029 and EventCode = Goal Home
new_col1 = CurrentPlaytime/3*10**4

for EventCodeId = 2053 and ventCode = Goal Away
new_col2 = CurrentPlaytime/3*10**4

對於其他所有EventCodeIdEventCode new_co1new_col2將采用0.

這是我的開始方式,但無法繼續。 請幫忙

new_col1 = []
new_col2 = []
def timeslot(EventCodeId, EventCode, CurrentPlaytime):
    if x == 1029 and y == 'Goal Home':
        new.Col1.append(z/(3*10**4))
    elif x == 2053 and y == 'Goal Away':
        new_col2.append(z/(3*10**4))
    else:
        new_col1.append(0)
        new_col2.append(0)
    return new_col1
    return new_col2



df1['new_col1', 'new_col2'] = df1.apply(lambda x,y,z: timeslot(x['EventCodeId'], y['EventCode'], z['CurrentPlaytime']), axis=1)  

TypeError: ("<lambda>() missing 2 required positional arguments: 'y' and 'z'", 'occurred at index 0')

您不需要顯式循環。 盡可能使用矢量化操作。

使用numpy.where

s = df1['CurrentPlaytime']/3*10**4

mask1 = (df1['EventCodeId'] == 1029) & (df1['EventCode'] == 'Goal')
mask2 = (df1['EventCodeId'] == 2053) & (df1['EventCode'] == 'Away')

df1['new_col1'] = np.where(mask1, s, 0)
df1['new_col2'] = np.where(mask2, s, 0)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM