For this project, I have two dataframes one called df1 and another called df2 . These dataframes are not the same size (don't think that matters).
Each of them have a datetime in the first column. What I am trying to do is:
I want to make a new dataframe df3 that contains the remaining data from the df2 but only for the when the original times in the first column are closest to the df1 .
Here is an example of what the dataframes might look like:
print (df1)
Output:
Date Val
0 2015-02-24 00:00:02 1.764052
1 2015-02-24 00:01:15 0.400157
2 2015-02-24 00:02:22 0.978738
3 2015-02-24 00:03:39 2.240893
4 2015-02-24 00:04:00 1.867558
print (df2)
Output:
Date Val Name
0 2015-02-24 00:00:00 -0.977278 John
1 2015-02-24 00:01:00 0.950088 Robert
2 2015-02-24 00:02:00 -0.103219 Sam
3 2015-02-24 00:03:00 0.151357 Tim
4 2015-02-24 00:04:00 0.410599 Hector
5 2015-02-24 00:05:00 0.673247 Melissa
Then what I want to be able to retrieve is something like this:
print (df2)
Output:
Date Val Name
0 2015-02-24 00:00:00 -0.977278 John
1 2015-02-24 00:01:00 0.950088 Robert
2 2015-02-24 00:02:00 -0.103219 Sam
3 2015-02-24 00:04:00 0.410599 Hector
4 2015-02-24 00:04:00 0.410599 Hector
5 2015-02-24 00:05:00 0.673247 Melissa
I have searched around a bit and found that there were two similar posts on here example-1 example-2 , but the difference being they just want a single value returned or a single row. For my purposes I want it to be 'Filtered' so to speak.
If anyone can provide any insight, that would be greatly appreciated, thank you.
If I understand correctly, I believe this gives you what you're looking for.
df1['df2_idx'] = df1.Date.apply(lambda x: [(abs(df2['Date'] - x)).idxmin()][0])
df3 = df2.reindex(df1['df2_idx'], axis=0).reset_index().drop(['df2_idx'], axis=1)
The first line just finds the row in df2
that is closest to each row in df1
and appends the indices as a column to df1
like this:
Date Val df2_idx
0 2015-02-24 00:00:02 1.764052 0
1 2015-02-24 00:01:15 0.400157 1
2 2015-02-24 00:02:22 0.978738 2
3 2015-02-24 00:03:39 2.240893 4
4 2015-02-24 00:04:00 1.867558 4
The second row then just subsets df2
with the column of indices. The final output is
Date Val Name
0 2015-02-24 00:00:00 -0.977278 John
1 2015-02-24 00:01:00 0.950088 Robert
2 2015-02-24 00:02:00 -0.103219 Sam
3 2015-02-24 00:04:00 0.410599 Hector
4 2015-02-24 00:04:00 0.410599 Hector
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.