繁体   English   中英

如何对列值在特定范围内的两个数据框进行外部合并?

[英]How to do outer merge of two dataframes for which column values are within a certain range?

这是this的后续问题

我有两个dataframes

print df_1

  timestamp      A          B
0 2016-05-15     0.020228   0.026572
1 2016-05-15     0.057780   0.175499
2 2016-05-15     0.098808   0.620986
3 2016-05-17     0.158789   1.014819
4 2016-05-17     0.038129   2.384590
5 2018-05-17     0.011111   9.999999


print df_2

  start                end  event    
0 2016-05-14   2016-05-16   E1
1 2016-05-14   2016-05-16   E2
2 2016-05-17   2016-05-18   E3

如果timestamp介于startend之间,我想合并df_1df_2并在df_1获取event column

存在的问题,并与不同这个问题,是

1) event s E1E2具有相同的startend

2) 同样在df_1第 6行不属于任何间隔。

最后,我希望同时拥有这两个事件,并且对于没有任何事件的行都有NA

所以我希望我的结果dataframe是这样的

  timestamp      A          B         event
0 2016-05-15     0.020228   0.026572  E1
1 2016-05-15     0.057780   0.175499  E1
2 2016-05-15     0.098808   0.620986  E1
3 2016-05-15     0.020228   0.026572  E2 
4 2016-05-15     0.057780   0.175499  E2
5 2016-05-15     0.098808   0.620986  E2
6 2016-05-17     0.158789   1.014819  E3
7 2016-05-17     0.038129   2.384590  E3
8 2018-05-17     0.011111   9.999999  NA
import pandas as pd

df_1 = pd.DataFrame({'timestamp':['2016-05-15','2016-05-15','2016-05-15','2016-05-17','2016-05-17','2018-05-17'],
                     'A':[1,1,1,1,1,1]})
df_2 = pd.DataFrame({'start':['2016-05-14','2016-05-14','2016-05-17'],
                     'end':['2016-05-16','2016-05-16','2016-05-18'],
                     'event':['E1','E2','E3']})
df_1.timestamp = pd.to_datetime(df_1.timestamp, format='%Y-%m-%d')
df_2.start = pd.to_datetime(df_2.start, format='%Y-%m-%d')
df_2.end = pd.to_datetime(df_2.end, format='%Y-%m-%d')

# convert game_ref_dt to long format with all the dates in between, and do a left merge on date
df_2_2 = pd.melt(df_2, id_vars='event', value_name='timestamp')
df_2_2.timestamp = pd.to_datetime(df_2_2.timestamp)
df_2_2.set_index('timestamp', inplace=True)
df_2_2.drop('variable', axis=1, inplace=True)

df_2_3 = df_2_2.groupby('event').resample('D').ffill().reset_index(level=0, drop=True).reset_index()

df_2 = pd.merge(df_2, df_2_3)
df_2 = df_2.drop(columns=['start', 'end'])

df_1 = df_1.merge(df_2,on='timestamp',  how='left')

print(df_1)
   timestamp  A event
0 2016-05-15  1    E1
1 2016-05-15  1    E2
2 2016-05-15  1    E1
3 2016-05-15  1    E2
4 2016-05-15  1    E1
5 2016-05-15  1    E2
6 2016-05-17  1    E3
7 2016-05-17  1    E3
8 2018-05-17  1   NaN

归功于

也是这个解决方案,但不会在最后一行给出NA

  import pandas as pd

df_1 = pd.DataFrame({'timestamp':['2016-05-15','2016-05-15','2016-05-15','2016-05-17','2016-05-17','2018-05-17'],
                     'A':[1,1,1,1,1,1]})
df_2 = pd.DataFrame({'start':['2016-05-14','2016-05-14','2016-05-17'],
                     'end':['2016-05-16','2016-05-16','2016-05-18'],
                     'event':['E1','E2','E3']})   

df_try2 = pd.merge(df_1.assign(key=1), df_2.assign(key=1), on='key').query('timestamp >= start and timestamp <= end')    

print(df_try2)

   timestamp  A  key      start        end event
0  2016-05-15  1    1 2016-05-14 2016-05-16    E1
1  2016-05-15  1    1 2016-05-14 2016-05-16    E2
3  2016-05-15  1    1 2016-05-14 2016-05-16    E1
4  2016-05-15  1    1 2016-05-14 2016-05-16    E2
6  2016-05-15  1    1 2016-05-14 2016-05-16    E1
7  2016-05-15  1    1 2016-05-14 2016-05-16    E2
11 2016-05-17  1    1 2016-05-17 2016-05-18    E3
14 2016-05-17  1    1 2016-05-17 2016-05-18    E3

一种选择是conditional_joinpyjanitor ,它可以帮助抽象不平等加入:

# pip install pyjanitor
import pandas as pd
import janitor

(df_1.conditional_join(
         df_2, 
         ('timestamp', 'start', '>='), 
         ('timestamp', 'end', '<='), 
         how = 'left')
     .loc[:, ['timestamp', 'A', 'B', 'event']]
)
   timestamp         A         B event
0 2016-05-15  0.020228  0.026572    E1
1 2016-05-15  0.020228  0.026572    E2
2 2016-05-15  0.057780  0.175499    E1
3 2016-05-15  0.057780  0.175499    E2
4 2016-05-15  0.098808  0.620986    E1
5 2016-05-15  0.098808  0.620986    E2
6 2016-05-17  0.158789  1.014819    E3
7 2016-05-17  0.038129  2.384590    E3
8 2018-05-17  0.011111  9.999999   NaN

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM