[英]Python Pandas : compare two data-frames along one column and return content of rows of both data frames in another data frame
Would be great, if I get a more robust solution for the problem considering computational time, memory and power(Intel Core i7-6700HQ, 8 GB Ram) 如果考虑到计算时间,内存和功耗(英特尔酷睿i7-6700HQ,8 GB Ram),我会得到一个更强大的问题解决方案,那将会很棒
Here is the sample data, 这是样本数据,
import pandas as pd
df1 = pd.DataFrame({'time': [35427889701, 35427909854, 35427929709,35427949712, 35428009860],
'velocity_x':[12.5451, 12.5401,12.5351,12.5401,12.5251],
'yaw' : [-0.0787806, -0.0784749, -0.0794889,-0.0795915,-0.0795472]})
df2 = pd.DataFrame({'time': [35427929709, 35427949712, 35427009860,35427029728, 35427049705],
'velocity':[12.6583, 12.6556,12.6556,12.6556,12.6444],
'yawrate' : [-0.0750492, -0.0750492, -0.074351,-0.074351,-0.074351]})
df3 = pd.DataFrame(columns=['time','velocity_x','yaw','velocity','yawrate'])
for index, row in df1.iterrows():
min=100000
for indexer, rows in df2.iterrows():
if abs(float(row['time'])-float(rows['time']))<min:
min = abs(float(row['time'])-float(rows['time']))
#storing the position
pos = indexer
df3.loc[index,'time'] = df1['time'][pos]
df3.loc[index,'velocity_x'] = df1['velocity_x'][pos]
df3.loc[index,'yaw'] = df1['yaw'][pos]
df3.loc[index,'velocity'] = df2['velocity'][pos]
df3.loc[index,'yawrate'] = df2['yawrate'][pos]
df1['key'] = 1
df2['key'] = 1
df1.rename(index=str, columns ={'time' : 'time_x'}, inplace=True)
df = df2.merge(df1, on='key', how ='left').reset_index()
df['diff'] = df.apply(lambda x: abs(x['time'] - x['time_x']), axis=1)
df.sort_values(by=['time', 'diff'], inplace=True)
df=df.groupby(['time']).first().reset_index()[['time', 'velocity_x', 'yaw', 'velocity', 'yawrate']]
You're looking for pandas.merge_asof
. 您正在寻找
pandas.merge_asof
。 It allows you to combine 2 DataFrame
s on a key, in this case time
, without the requirement that they are an exact match. 它允许您在一个键上组合2个
DataFrame
,在这种情况下是time
,而不要求它们完全匹配。 You can choose a direction
for prioritizing the match, but in this case it's obvious that you want nearest
你可以选择一个
direction
来确定匹配的优先次序,但在这种情况下,显然你想要nearest
A “nearest” search selects the row in the right DataFrame whose 'on' key is closest in absolute distance to the left's key.
“最近”搜索选择右侧DataFrame中的行,其中“on”键与左侧键的绝对距离最近。
One caveat is that you need to sort things for merge_asof
to work. 需要注意的是,您需要对
merge_asof
进行排序才能正常工作。
import pandas as pd
pd.merge_asof(df2.sort_values('time'), df1.sort_values('time'), on='time', direction='nearest')
# time velocity yawrate velocity_x yaw
#0 35427009860 12.6556 -0.074351 12.5451 -0.078781
#1 35427029728 12.6556 -0.074351 12.5451 -0.078781
#2 35427049705 12.6444 -0.074351 12.5451 -0.078781
#3 35427929709 12.6583 -0.075049 12.5351 -0.079489
#4 35427949712 12.6556 -0.075049 12.5401 -0.079591
Just be careful about which DataFrame
you choose as the left or right frame, as that changes the result. 请注意您选择哪个
DataFrame
作为左框架或右框架,因为这会更改结果。 In this case I'm selecting the time
in df1
which is closest in absolute distance to the time
in df2
. 在这种情况下,我选择
time
在df1
最接近的绝对距离的time
在df2
。
You also need to be careful if you have duplicated on
keys in the right df
because for exact matches, merge_asof
only merges the last sorted row of the right df
to the left df
, instead of creating multiple entries for each exact match. 如果右侧
df
键重复on
则还需要小心,因为对于完全匹配, merge_asof
仅将右侧df
的最后一个排序行合并到左侧df
,而不是为每个完全匹配创建多个条目。 If that's a problem, you can instead merge the exact keys first to get all of the combinations, and then merge the remainder with asof. 如果这是一个问题,您可以先将精确键合并以获得所有组合,然后将余数与asof合并。
just a side note (as not an answer) 只是旁注(不是答案)
min_delta=100000
for indexer, rows in df2.iterrows():
if abs(float(row['time'])-float(rows['time']))<min_delta:
min_delta = abs(float(row['time'])-float(rows['time']))
#storing the position
pos = indexer
can be written as 可写成
diff = np.abs(row['time'] - df2['time'])
pos = np.argmin(diff)
(always avoid for loops) (总是避免循环)
and don't call your vars with a built-in name ( min
) 并且不要使用内置名称调用您的变量(
min
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.