简体   繁体   中英

quickest way to look up nearest index value

consider the time series s and it's index tidx

tidx = pd.date_range('2010-12-31', periods=3, freq='M')
s = pd.Series([0, 31, 59], tidx)

If I wanted to use s as a lookup series and passed the date '2011-02-23' , I'd want to get the most recently available value. In this case that would be 31 .

I've done

s.resample('D').ffill().loc['2011-02-23']

31

This does the job, but I had to resample the whole series just to get a single value. What is a more appropriate way to do this?

You could use searchsorted -

s[s.index.searchsorted('2011-02-23','right')-1]

Fun is when you beat yourself! So, here's a bit more of NumPy into the mix for further performance boost -

s[s.index.values.searchsorted(np.datetime64('2011-02-23'),'right')-1]

Runtime test -

In [235]: tidx = pd.date_range('2010-12-31', periods=300, freq='M')
     ...: s = pd.Series(range(300), tidx)
     ...: 

In [236]: s[s.index.searchsorted('2035-03-23','right')-1]
Out[236]: 290

In [237]: s[s.index.values.searchsorted(np.datetime64('2035-03-23'),'right')-1]
Out[237]: 290

In [238]: %timeit s[s.index.searchsorted('2035-03-23','right')-1]
10000 loops, best of 3: 63 µs per loop

In [239]: %timeit s[s.index.values.searchsorted(np.datetime64('2035-03-23'),'right')-1]
10000 loops, best of 3: 46.7 µs per loop

what about this?

In [150]: s[s.index <= '2011-02-23'].tail(1)
Out[150]:
2011-01-31    31
Freq: M, dtype: int64

PS it'll work only if the index is sorted...

I used s.index.get_loc()

docs

It allows to find the "closest" index value location.

s.iloc[s.index.get_loc('2011-02-23', 'ffill')]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM