简体   繁体   中英

pandas compare two dataframes with criteria

I have two dataframes. df1 and df2. I would like to get whatever values are common from df1 and df2 and the dt value of df2 must be greater than df1's dt value In this case, the expected value is fee

df1 = pd.DataFrame([['2015-01-01 06:00','foo'], 
    ['2015-01-01 07:00','fee'], ['2015-01-01 08:00','fum']],
    columns=['dt', 'value'])
df1.dt=pd.to_datetime(df1.dt)

df2=pd.DataFrame([['2015-01-01 06:10','zoo'],
    ['2015-01-01 07:10','fee'],['2015-01-01 08:10','feu'],
    ['2015-01-01 09:10','boo']], columns=['dt', 'value'])
df2.dt=pd.to_datetime(df2.dt)

One way would be to merge on 'value' column so this will produce only matching rows, you can then filter the merged df using the 'dt_x', 'dt_y' columns:

In [15]:

merged = df2.merge(df1, on='value')
merged[merged['dt_x'] > merged['dt_y']]
Out[15]:
                 dt_x value                dt_y
0 2015-01-01 07:10:00   fee 2015-01-01 07:00:00

You can't do something like the following because the lengths don't match:

df2[ (df2['value'].isin(df1['value'])) & (df2['dt'] > df1['dt']) ]

raises:

ValueError: Series lengths must match to compare

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM