简体   繁体   中英

How to filter one dataframe with the nearest time to another?

For this project, I have two dataframes one called df1 and another called df2 . These dataframes are not the same size (don't think that matters).

Each of them have a datetime in the first column. What I am trying to do is:

I want to make a new dataframe df3 that contains the remaining data from the df2 but only for the when the original times in the first column are closest to the df1 .

Here is an example of what the dataframes might look like:

 print (df1)
 Output:
                 Date       Val
 0 2015-02-24 00:00:02  1.764052
 1 2015-02-24 00:01:15  0.400157
 2 2015-02-24 00:02:22  0.978738
 3 2015-02-24 00:03:39  2.240893
 4 2015-02-24 00:04:00  1.867558

 print (df2)
 Output:
                 Date       Val      Name  
 0 2015-02-24 00:00:00  -0.977278    John
 1 2015-02-24 00:01:00   0.950088    Robert
 2 2015-02-24 00:02:00  -0.103219    Sam
 3 2015-02-24 00:03:00   0.151357    Tim
 4 2015-02-24 00:04:00   0.410599    Hector 
 5 2015-02-24 00:05:00   0.673247    Melissa 

Then what I want to be able to retrieve is something like this:

print (df2)
 Output:
                 Date       Val      Name  
 0 2015-02-24 00:00:00  -0.977278    John
 1 2015-02-24 00:01:00   0.950088    Robert
 2 2015-02-24 00:02:00  -0.103219    Sam
 3 2015-02-24 00:04:00   0.410599    Hector
 4 2015-02-24 00:04:00   0.410599    Hector 
 5 2015-02-24 00:05:00   0.673247    Melissa 

I have searched around a bit and found that there were two similar posts on here example-1 example-2 , but the difference being they just want a single value returned or a single row. For my purposes I want it to be 'Filtered' so to speak.

If anyone can provide any insight, that would be greatly appreciated, thank you.

If I understand correctly, I believe this gives you what you're looking for.

df1['df2_idx'] = df1.Date.apply(lambda x: [(abs(df2['Date'] - x)).idxmin()][0])
df3 = df2.reindex(df1['df2_idx'], axis=0).reset_index().drop(['df2_idx'], axis=1)

The first line just finds the row in df2 that is closest to each row in df1 and appends the indices as a column to df1 like this:

                 Date       Val  df2_idx
0 2015-02-24 00:00:02  1.764052        0
1 2015-02-24 00:01:15  0.400157        1
2 2015-02-24 00:02:22  0.978738        2
3 2015-02-24 00:03:39  2.240893        4
4 2015-02-24 00:04:00  1.867558        4

The second row then just subsets df2 with the column of indices. The final output is

                 Date       Val    Name
0 2015-02-24 00:00:00 -0.977278    John
1 2015-02-24 00:01:00  0.950088  Robert
2 2015-02-24 00:02:00 -0.103219     Sam
3 2015-02-24 00:04:00  0.410599  Hector
4 2015-02-24 00:04:00  0.410599  Hector

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM