[英]pandas add column to dataframe having the value from another row based on condition
I have a dataframe with columns named 'id', 'x', 'y', and 'time'我有一个 dataframe 列名为“id”、“x”、“y”和“时间”
id ID | time时间 | x X | y是的 |
---|---|---|---|
1 1 | 0 0 | 14 14 | 12 12 |
1 1 | 1 1 | 32 32 | 23 23 |
1 1 | 2 2 | 52 52 | 14 14 |
2 2 | 2 2 | 12 12 | 34 34 |
3 3 | 0 0 | 62 62 | 17 17 |
3 3 | 1 1 | 82 82 | 35 35 |
3 3 | 2 2 | 22 22 | 25 25 |
I want to add two columns to the dataframe so that they have the value of x and y from another row having the same id and a time + 2我想在 dataframe 中添加两列,以便它们具有来自具有相同 id 和时间 + 2 的另一行的 x 和 y 值
the result should like this:结果应该是这样的:
id ID | time时间 | x X | y是的 | x2 x2 | y2 y2 |
---|---|---|---|---|---|
1 1 | 0 0 | 14 14 | 12 12 | 52 52 | 14 14 |
1 1 | 1 1 | 32 32 | 23 23 | ||
1 1 | 2 2 | 52 52 | 14 14 | ||
2 2 | 2 2 | 12 12 | 34 34 | ||
3 3 | 0 0 | 62 62 | 17 17 | 22 22 | 25 25 |
3 3 | 1 1 | 82 82 | 35 35 | ||
3 3 | 2 2 | 22 22 | 25 25 |
please note that the dataframe is not sorted by id请注意 dataframe 不是按 id 排序的
I have tried the following for x2 but it is not working as intended:我已经为 x2 尝试了以下方法,但它没有按预期工作:
t=2
data['x2'] = data.apply(lambda x: x['x'] if (data[(data['id']==x['id']) & ((data['time']+t) == x['time'])].size > 0) else '', axis=1)
The following works but I need to use a shortcut way and the one with the best performance because my data is huge以下工作,但我需要使用快捷方式和性能最好的方式,因为我的数据很大
t=2
for index, row in data.iterrows():
rowT = data[(data['id']==row['id']) & (data['time'] == (row['time'] + t))]
if rowT.size > 0:
data.loc[index,'x2'] = rowT['x'].values[0]
You can create a new dataframe by repopulating the values in time
column with the values at t-2
seconds, then left merge
this new dataframe with the original dataframe on the columns id, time
to get the result:您可以通过使用t-2
秒的值重新填充time
列中的值来创建新的 dataframe,然后将这个新的 dataframe 与原始 Z6A8064B5DF4794555500553C47C50 时间列merge
id, time
以获取结果
df_r = df.assign(time=df['time'].sub(2))
df.merge(df_r, on=['id', 'time'], how='left', suffixes=['', '2'])
id time x y x2 y2
0 1 0 14 12 52.0 14.0
1 1 1 32 23 NaN NaN
2 1 2 52 14 NaN NaN
3 2 2 12 34 NaN NaN
4 3 0 62 17 22.0 25.0
5 3 1 82 35 NaN NaN
6 3 2 22 25 NaN NaN
look up time +2 within each id每个id内的查找时间+2
id=[1,1,1,2,3,3,3]
time=[0,1,2,2,0,1,2]
x=[14,32,52,12,62,82,22]
y=[12,23,14,34,17,35,25]
df=pd.DataFrame({'id':id,'time':time,'x':x,'y':y})
df.reset_index()
df['x2']=0
df['y2']=0
for key,item in df.iterrows():
lookup=(item['time']+2)
filter=(df['time']==lookup) & (df['id']==item['id'])
results=df[filter]
if len(results)>0:
row=results.iloc[0]
x2=row.x
y2=row.y
df.loc[key,['x2','y2']]=(x2,y2)
print(df)
output:
id time x y x2 y2
0 1 0 14 12 52 14
1 1 1 32 23 0 0
2 1 2 52 14 0 0
3 2 2 12 34 0 0
4 3 0 62 17 22 25
5 3 1 82 35 0 0
6 3 2 22 25 0 0
#no looping
df2=df.copy()
df2['time'] = df2.apply(lambda x: x['time']+2, axis=1)
results=df2[['id','time','x','y']].merge(df[['id','time','x','y']]
,on=['id','time'],how="left",suffixes=('', '2')).fillna(0)
print(results)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.