[英]What is the optimal way to create a new column in Pandas dataframe based on conditions from another row?
我有一個 Pandas dataframe, week1_plays
格式如下:
What I want to do is add a column week1_plays['distance_from_receiver']
such that for each row in the dataframe, we grab the keys of gameId, playId, frameId
and find the x and y position of the player with those keys and position == 'WR'
。 然后我將使用以下 function 計算與接收器的距離:
def get_distance(rec_x, rec_y, def_x, def_y):
distance = np.sqrt( ((def_x - rec_x)**2) + ((def_y - rec_y)**2) )
return distance
例如,使用提供的示例,function 的第 0 行輸入將是
get_distance(91.35, 44.16, 88.89, 36.47)
我目前的解決方案是在 dataframe 上使用 lambda function :
week1_topReceivers['distance_from_receiver'] = week1_topReceivers.apply(lambda row: get_distance(week1_wr_position.loc[np.where((week1_topReceivers['playId'] == row['playId']) & (week1_topReceivers['frameId'] == row['frameId']) & (week1_topReceivers['gameId'] == row['frameId']))]['x'],
week1_topReceivers.loc[np.where((week1_topReceivers['playId'] == row['playId']) & (week1_topReceivers['frameId'] == row['frameId']) & (week1_topReceivers['gameId'] == row['frameId']))]['y'], row['x'], row['y']), axis = 1)
但是在 dataframe 中查詢前兩個輸入的 dataframe 需要很長時間。 我知道必須有一個更優化的解決方案,但我的在線搜索並沒有找到更好的選擇。
編輯:這是一個更大的樣本和預期的 output:
樣本
x y o dir event position frameId team gameId playId playDirection route
88.89 36.47 105.63 66.66 None SS 1 home 2018090600 75 left NaN
91.35 44.16 290.45 16.86 None WR 1 away 2018090600 75 left HITCH
86.31 22.01 70.12 168.91 None FS 1 home 2018090600 75 left NaN
73.64 28.70 103.05 219.41 None FS 1 home 2018090600 75 left NaN
86.48 31.12 95.90 33.36 None MLB 1 home 2018090600 75 left NaN
82.67 20.53 81.14 174.57 None CB 1 home 2018090600 75 left NaN
84.00 43.49 108.23 110.32 None CB 1 home 2018090600 75 left NaN
85.63 26.59 87.69 38.80 None LB 1 home 2018090600 75 left NaN
88.89 36.47 105.63 68.49 None SS 2 home 2018090600 75 left NaN
91.37 44.17 290.45 29.61 None WR 2 away 2018090600 75 left HITCH
86.32 22.00 70.88 119.04 None FS 2 home 2018090600 75 left NaN
73.64 28.70 104.57 228.17 None FS 2 home 2018090600 75 left NaN
86.48 31.11 101.10 30.26 None MLB 2 home 2018090600 75 left NaN
82.68 20.53 82.24 147.46 None CB 2 home 2018090600 75 left NaN
84.02 43.49 107.33 106.73 None CB 2 home 2018090600 75 left NaN
85.64 26.61 87.69 37.51 None LB 2 home 2018090600 75 left NaN
88.88 36.47 107.02 57.53 None SS 3 home 2018090600 75 left NaN
91.37 44.17 290.45 32.20 None WR 3 away 2018090600 75 left HITCH
86.33 22.00 71.88 93.49 None FS 3 home 2018090600 75 left NaN
73.63 28.69 104.57 227.74 None FS 3 home 2018090600 75 left NaN
預期 OUTPUT:
x y o dir event position frameId team gameId playId playDirection route distance_from_receiver
88.89 36.47 105.63 66.66 None SS 1 home 2018090600 75 left NaN 8.07
91.35 44.16 290.45 16.86 None WR 1 away 2018090600 75 left HITCH 0.00
86.31 22.01 70.12 168.91 None FS 1 home 2018090600 75 left NaN 22.72
73.64 28.70 103.05 219.41 None FS 1 home 2018090600 75 left NaN 23.51
86.48 31.12 95.90 33.36 None MLB 1 home 2018090600 75 left NaN 13.92
82.67 20.53 81.14 174.57 None CB 1 home 2018090600 75 left NaN 25.17
84.00 43.49 108.23 110.32 None CB 1 home 2018090600 75 left NaN 7.38
85.63 26.59 87.69 38.80 None LB 1 home 2018090600 75 left NaN 18.48
88.89 36.47 105.63 68.49 None SS 2 home 2018090600 75 left NaN 8.09
91.37 44.17 290.45 29.61 None WR 2 away 2018090600 75 left HITCH 0.00
86.32 22.00 70.88 119.04 None FS 2 home 2018090600 75 left NaN 22.74
73.64 28.70 104.57 228.17 None FS 2 home 2018090600 75 left NaN 23.53
86.48 31.11 101.10 30.26 None MLB 2 home 2018090600 75 left NaN 13.95
82.68 20.53 82.24 147.46 None CB 2 home 2018090600 75 left NaN 25.19
84.02 43.49 107.33 106.73 None CB 2 home 2018090600 75 left NaN 7.39
85.64 26.61 87.69 37.51 None LB 2 home 2018090600 75 left NaN 18.47
88.88 36.47 107.02 57.53 None SS 3 home 2018090600 75 left NaN 8.09
91.37 44.17 290.45 32.20 None WR 3 away 2018090600 75 left HITCH 0.00
86.33 22.00 71.88 93.49 None FS 3 home 2018090600 75 left NaN 22.74
73.63 28.69 104.57 227.74 None FS 3 home 2018090600 75 left NaN 23.54
您正在尋找merge
或join
操作。 嘗試這樣的事情:
df = pd.DataFrame({'gameId':[1,1,1,1,1,1],'playId':[1,1,1,1,1,1],
'frameId':[1,1,1,2,2,2], 'position':['A','B','WR','C','WR','D'],
'x':[87,56,45,34,45,67], 'y':[25,36,47,365,25,36]})
# create a table with just the wide receiver positions:
wr = df.loc[df.position=='WR'].drop(columns='position')
# merge the wide receiver x,y values into the original table based on the keys:
df = df.merge(wr, how='outer', on=['gameId', 'playId', 'frameId'], suffixes=['', '_wr'])
# apply your function to calculate the column (avoid using apply because it's super slow)
df['dist_from_wr'] = [get_distance(x, y, x_wr, y_wr) for x, y, x_wr, y_wr
in zip(df.x, df.y, df.x_wr, df.y_wr)]
另請注意,您在這里很幸運,因為您的 function 已經矢量化(並非總是如此),因此您實際上可以通過將整個列作為輸入 arguments 傳遞來更有效地執行此操作,如下所示:
df['dist_from_wr'] = get_distance(df.x, df.y, df.x_wr, df.y_wr)
結果:
| gameId | playId | frameId | position | x | y | x_wr | y_wr | dist_from_wr |
|-------:|-------:|--------:|:---------|----:|----:|-----:|-----:|-------------:|
| 1 | 1 | 1 | A | 87 | 25 | 45 | 47 | 47.4131 |
| 1 | 1 | 1 | B | 56 | 36 | 45 | 47 | 15.5563 |
| 1 | 1 | 1 | WR | 45 | 47 | 45 | 47 | 0 |
| 1 | 1 | 2 | C | 34 | 365 | 45 | 25 | 340.178 |
| 1 | 1 | 2 | WR | 45 | 25 | 45 | 25 | 0 |
| 1 | 1 | 2 | D | 67 | 36 | 45 | 25 | 24.5967 |
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.