简体   繁体   中英

Pandas find the index of the closest value in a data frame column given a list of values

So I am trying to find a vectorized way to do this, say I have a data frame DF1 that contains a timestamp column with a lot of timestamps in increasing order. I also have a list with a target time and I want to use this to find the row or index of that row in DF1 that contains the closest timestamp to that target time. I have a solution that uses iterrows but I was wondering if there is a way to do it without iterrows as this might take too long.

The number of rows in the data frame are always bigger than the list of target time values. If the target time is in between two time stamps then always pick the smallest time stamp.

#Dataframe with timestamp
In [1]: df = pd.DataFrame([100, 150, 155, 180,185, 200], columns=['TimeStamp'])
#List of target times
targetTimes = [120, 130, 180, 187]

The answer should be:

#A list of indices
[0, 1, 3, 4]
#Or something like this
Out[1]: 
   TimeStamp
0    100
1    150
2    180
3    185 

Use merge_asof with convert list to one column DataFrame :

df1 = pd.DataFrame(targetTimes, columns=['new'])

df = pd.merge_asof(df1, 
                   df, 
                   left_on='new', 
                   right_on='TimeStamp',
                   direction='nearest')
print (df)
   new  TimeStamp
0  120        100
1  130        150
2  180        180
3  187        185

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM