簡體   English   中英

根據另一行的條件在 Pandas dataframe 中創建新列的最佳方法是什么?

[英]What is the optimal way to create a new column in Pandas dataframe based on conditions from another row?

我有一個 Pandas dataframe, week1_plays格式如下:

在此處輸入圖像描述

What I want to do is add a column week1_plays['distance_from_receiver'] such that for each row in the dataframe, we grab the keys of gameId, playId, frameId and find the x and y position of the player with those keys and position == 'WR' 然后我將使用以下 function 計算與接收器的距離:

def get_distance(rec_x, rec_y, def_x, def_y):
    distance = np.sqrt( ((def_x - rec_x)**2) + ((def_y - rec_y)**2) )
    return distance

例如,使用提供的示例,function 的第 0 行輸入將是

get_distance(91.35, 44.16, 88.89, 36.47)

我目前的解決方案是在 dataframe 上使用 lambda function :

week1_topReceivers['distance_from_receiver'] = week1_topReceivers.apply(lambda row: get_distance(week1_wr_position.loc[np.where((week1_topReceivers['playId'] == row['playId']) & (week1_topReceivers['frameId'] == row['frameId']) & (week1_topReceivers['gameId'] == row['frameId']))]['x'],
week1_topReceivers.loc[np.where((week1_topReceivers['playId'] == row['playId']) & (week1_topReceivers['frameId'] == row['frameId']) & (week1_topReceivers['gameId'] == row['frameId']))]['y'], row['x'], row['y']), axis = 1)

但是在 dataframe 中查詢前兩個輸入的 dataframe 需要很長時間。 我知道必須有一個更優化的解決方案,但我的在線搜索並沒有找到更好的選擇。

編輯:這是一個更大的樣本和預期的 output:

樣本

x   y   o   dir event   position    frameId team    gameId  playId  playDirection   route
88.89   36.47   105.63  66.66   None    SS  1   home    2018090600  75  left    NaN
91.35   44.16   290.45  16.86   None    WR  1   away    2018090600  75  left    HITCH
86.31   22.01   70.12   168.91  None    FS  1   home    2018090600  75  left    NaN
73.64   28.70   103.05  219.41  None    FS  1   home    2018090600  75  left    NaN
86.48   31.12   95.90   33.36   None    MLB 1   home    2018090600  75  left    NaN
82.67   20.53   81.14   174.57  None    CB  1   home    2018090600  75  left    NaN
84.00   43.49   108.23  110.32  None    CB  1   home    2018090600  75  left    NaN
85.63   26.59   87.69   38.80   None    LB  1   home    2018090600  75  left    NaN
88.89   36.47   105.63  68.49   None    SS  2   home    2018090600  75  left    NaN
91.37   44.17   290.45  29.61   None    WR  2   away    2018090600  75  left    HITCH
86.32   22.00   70.88   119.04  None    FS  2   home    2018090600  75  left    NaN
73.64   28.70   104.57  228.17  None    FS  2   home    2018090600  75  left    NaN
86.48   31.11   101.10  30.26   None    MLB 2   home    2018090600  75  left    NaN
82.68   20.53   82.24   147.46  None    CB  2   home    2018090600  75  left    NaN
84.02   43.49   107.33  106.73  None    CB  2   home    2018090600  75  left    NaN
85.64   26.61   87.69   37.51   None    LB  2   home    2018090600  75  left    NaN
88.88   36.47   107.02  57.53   None    SS  3   home    2018090600  75  left    NaN
91.37   44.17   290.45  32.20   None    WR  3   away    2018090600  75  left    HITCH
86.33   22.00   71.88   93.49   None    FS  3   home    2018090600  75  left    NaN
73.63   28.69   104.57  227.74  None    FS  3   home    2018090600  75  left    NaN

預期 OUTPUT:

x   y   o   dir event   position    frameId team    gameId  playId  playDirection   route   distance_from_receiver
88.89   36.47   105.63  66.66   None    SS  1   home    2018090600  75  left    NaN 8.07
91.35   44.16   290.45  16.86   None    WR  1   away    2018090600  75  left    HITCH   0.00
86.31   22.01   70.12   168.91  None    FS  1   home    2018090600  75  left    NaN 22.72
73.64   28.70   103.05  219.41  None    FS  1   home    2018090600  75  left    NaN 23.51
86.48   31.12   95.90   33.36   None    MLB 1   home    2018090600  75  left    NaN 13.92
82.67   20.53   81.14   174.57  None    CB  1   home    2018090600  75  left    NaN 25.17
84.00   43.49   108.23  110.32  None    CB  1   home    2018090600  75  left    NaN 7.38
85.63   26.59   87.69   38.80   None    LB  1   home    2018090600  75  left    NaN 18.48
88.89   36.47   105.63  68.49   None    SS  2   home    2018090600  75  left    NaN 8.09
91.37   44.17   290.45  29.61   None    WR  2   away    2018090600  75  left    HITCH   0.00
86.32   22.00   70.88   119.04  None    FS  2   home    2018090600  75  left    NaN 22.74
73.64   28.70   104.57  228.17  None    FS  2   home    2018090600  75  left    NaN 23.53
86.48   31.11   101.10  30.26   None    MLB 2   home    2018090600  75  left    NaN 13.95
82.68   20.53   82.24   147.46  None    CB  2   home    2018090600  75  left    NaN 25.19
84.02   43.49   107.33  106.73  None    CB  2   home    2018090600  75  left    NaN 7.39
85.64   26.61   87.69   37.51   None    LB  2   home    2018090600  75  left    NaN 18.47
88.88   36.47   107.02  57.53   None    SS  3   home    2018090600  75  left    NaN 8.09
91.37   44.17   290.45  32.20   None    WR  3   away    2018090600  75  left    HITCH   0.00
86.33   22.00   71.88   93.49   None    FS  3   home    2018090600  75  left    NaN 22.74
73.63   28.69   104.57  227.74  None    FS  3   home    2018090600  75  left    NaN 23.54

您正在尋找mergejoin操作。 嘗試這樣的事情:

df = pd.DataFrame({'gameId':[1,1,1,1,1,1],'playId':[1,1,1,1,1,1],
                   'frameId':[1,1,1,2,2,2], 'position':['A','B','WR','C','WR','D'],
                   'x':[87,56,45,34,45,67], 'y':[25,36,47,365,25,36]})

# create a table with just the wide receiver positions:
wr = df.loc[df.position=='WR'].drop(columns='position')

# merge the wide receiver x,y values into the original table based on the keys:
df = df.merge(wr, how='outer', on=['gameId', 'playId', 'frameId'], suffixes=['', '_wr'])

# apply your function to calculate the column (avoid using apply because it's super slow)
df['dist_from_wr'] = [get_distance(x, y, x_wr, y_wr) for x, y, x_wr, y_wr
                      in zip(df.x, df.y, df.x_wr, df.y_wr)]

另請注意,您在這里很幸運,因為您的 function 已經矢量化(並非總是如此),因此您實際上可以通過將整個列作為輸入 arguments 傳遞來更有效地執行此操作,如下所示:

df['dist_from_wr'] = get_distance(df.x, df.y, df.x_wr, df.y_wr)

結果:

| gameId | playId | frameId | position |   x |   y | x_wr | y_wr | dist_from_wr |
|-------:|-------:|--------:|:---------|----:|----:|-----:|-----:|-------------:|
|      1 |      1 |       1 | A        |  87 |  25 |   45 |   47 |      47.4131 |
|      1 |      1 |       1 | B        |  56 |  36 |   45 |   47 |      15.5563 |
|      1 |      1 |       1 | WR       |  45 |  47 |   45 |   47 |       0      |
|      1 |      1 |       2 | C        |  34 | 365 |   45 |   25 |     340.178  |
|      1 |      1 |       2 | WR       |  45 |  25 |   45 |   25 |       0      |
|      1 |      1 |       2 | D        |  67 |  36 |   45 |   25 |      24.5967 |

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM