[英]Merging two pandas dataframes by interval
I have two pandas dataframes with following format: 我有两个pandas数据帧,格式如下:
df_ts = pd.DataFrame([
[10, 20, 1, 'id1'],
[11, 22, 5, 'id1'],
[20, 54, 5, 'id2'],
[22, 53, 7, 'id2'],
[15, 24, 8, 'id1'],
[16, 25, 10, 'id1']
], columns = ['x', 'y', 'ts', 'id'])
df_statechange = pd.DataFrame([
['id1', 2, 'ok'],
['id2', 4, 'not ok'],
['id1', 9, 'not ok']
], columns = ['id', 'ts', 'state'])
I am trying to get it to the format, such as: 我想把它变成格式,例如:
df_out = pd.DataFrame([
[10, 20, 1, 'id1', None ],
[11, 22, 5, 'id1', 'ok' ],
[20, 54, 5, 'id2', 'not ok'],
[22, 53, 7, 'id2', 'not ok'],
[15, 24, 8, 'id1', 'ok' ],
[16, 25, 10, 'id1', 'not ok']
], columns = ['x', 'y', 'ts', 'id', 'state'])
I understand how to accomplish it iteratively by grouping by id and then iterating through each row and changing status when it appears. 我理解如何通过按ID分组迭代完成它,然后遍历每一行并在出现时更改状态。 Is there a pandas build-in more scalable way of doing this?
有没有一个pandas内置更可扩展的方式来做到这一点?
Unfortunately pandas merge support only equality joins. 不幸的是,pandas merge仅支持相等的连接。 See more details at the following thread: merge pandas dataframes where one value is between two others if you want to merge by interval you'll need to overcome the issue, for example by adding another filter after the merge:
在以下线程中查看更多详细信息: 合并pandas数据框,其中一个值介于两个其他值之间,如果要按间隔合并,则需要克服该问题,例如在合并后添加另一个过滤器:
joined = a.merge(b,on='id')
joined = joined[joined.ts.between(joined.ts1,joined.ts2)]
You can merge pandas data frames on two columns: 您可以在两列上合并pandas数据框:
pd.merge(df_ts,df_statechange, how='left',on=['id','ts'])
in df_statechange
that you shared here there is no common values on ts in both dataframes. 在您在此处共享的
df_statechange
,两个数据帧中的ts上没有共同的值。 Apparently you just copied not complete data frame here. 显然你刚刚在这里复制了不完整的数据框。 So i got this output:
所以我得到了这个输出:
x y ts id state
0 10 20 1 id1 NaN
1 11 22 5 id1 NaN
2 20 54 5 id2 NaN
3 22 53 7 id2 NaN
4 15 24 8 id1 NaN
5 16 25 10 id1 NaN
But indeed if you have common ts
in the data frames it will have your desired output. 但实际上,如果数据帧中有共同的
ts
,它将具有您想要的输出。 For example: 例如:
df_statechange = pd.DataFrame([
['id1', 5, 'ok'],
['id1', 8, 'ok'],
['id2', 5, 'not ok'],
['id2',7, 'not ok'],
['id1', 9, 'not ok']
], columns = ['id', 'ts', 'state'])
the output: 输出:
x y ts id state
0 10 20 1 id1 NaN
1 11 22 5 id1 ok
2 20 54 5 id2 not ok
3 22 53 7 id2 not ok
4 15 24 8 id1 ok
5 16 25 10 id1 NaN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.