简体   繁体   English

按间隔合并两个pandas数据帧

[英]Merging two pandas dataframes by interval

I have two pandas dataframes with following format: 我有两个pandas数据帧,格式如下:

df_ts = pd.DataFrame([
        [10, 20, 1,  'id1'],
        [11, 22, 5,  'id1'],
        [20, 54, 5,  'id2'],
        [22, 53, 7,  'id2'],
        [15, 24, 8,  'id1'],
        [16, 25, 10, 'id1']
    ], columns = ['x', 'y', 'ts', 'id'])


df_statechange = pd.DataFrame([
        ['id1', 2, 'ok'],
        ['id2', 4, 'not ok'],
        ['id1', 9, 'not ok']
    ], columns = ['id', 'ts', 'state'])

I am trying to get it to the format, such as: 我想把它变成格式,例如:

df_out = pd.DataFrame([
        [10, 20, 1,  'id1', None    ],
        [11, 22, 5,  'id1', 'ok'    ],
        [20, 54, 5,  'id2', 'not ok'],
        [22, 53, 7,  'id2', 'not ok'],
        [15, 24, 8,  'id1', 'ok'    ],
        [16, 25, 10, 'id1', 'not ok']
    ], columns = ['x', 'y', 'ts', 'id', 'state'])

I understand how to accomplish it iteratively by grouping by id and then iterating through each row and changing status when it appears. 我理解如何通过按ID分组迭代完成它,然后遍历每一行并在出现时更改状态。 Is there a pandas build-in more scalable way of doing this? 有没有一个pandas内置更可扩展的方式来做到这一点?

Unfortunately pandas merge support only equality joins. 不幸的是,pandas merge仅支持相等的连接。 See more details at the following thread: merge pandas dataframes where one value is between two others if you want to merge by interval you'll need to overcome the issue, for example by adding another filter after the merge: 在以下线程中查看更多详细信息: 合并pandas数据框,其中一个值介于两个其他值之间,如果要按间隔合并,则需要克服该问题,例如在合并后添加另一个过滤器:

joined = a.merge(b,on='id')
joined = joined[joined.ts.between(joined.ts1,joined.ts2)]

You can merge pandas data frames on two columns: 您可以在两列上合并pandas数据框:

pd.merge(df_ts,df_statechange, how='left',on=['id','ts'])

in df_statechange that you shared here there is no common values on ts in both dataframes. 在您在此处共享的df_statechange ,两个数据帧中的ts上没有共同的值。 Apparently you just copied not complete data frame here. 显然你刚刚在这里复制了不完整的数据框。 So i got this output: 所以我得到了这个输出:

    x   y  ts   id state
0  10  20   1  id1   NaN
1  11  22   5  id1   NaN
2  20  54   5  id2   NaN
3  22  53   7  id2   NaN
4  15  24   8  id1   NaN
5  16  25  10  id1   NaN

But indeed if you have common ts in the data frames it will have your desired output. 但实际上,如果数据帧中有共同的ts ,它将具有您想要的输出。 For example: 例如:

df_statechange = pd.DataFrame([
        ['id1', 5, 'ok'],
        ['id1', 8, 'ok'],
        ['id2', 5, 'not ok'],
        ['id2',7, 'not ok'],
        ['id1', 9, 'not ok']
    ], columns = ['id', 'ts', 'state'])

the output: 输出:

  x   y  ts   id   state
0  10  20   1  id1     NaN
1  11  22   5  id1      ok
2  20  54   5  id2  not ok
3  22  53   7  id2  not ok
4  15  24   8  id1      ok
5  16  25  10  id1     NaN

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM