简体   繁体   中英

pandas rolling appy on a dataframe

I have Yahoo Stock data, that I would like to manipulate, like so,

import pandas as pd
import pandas.io.data as web
data = web.DataReader('SPY','yahoo')
data.head()


Out[13]:
            Open    High    Low     Close   Volume  Adj Close
Date                        
2010-01-04  112.37  113.39  111.51  113.33  118944600   103.44
2010-01-05  113.26  113.68  112.85  113.63  111579900   103.71
2010-01-06  113.52  113.99  113.43  113.71  116074400   103.79
2010-01-07  113.50  114.33  113.18  114.19  131091100   104.23
2010-01-08  113.89  114.62  113.66  114.57  126402800   104.57

For any given date, I would like to look forward 2 days and find the lowest quote for it. So, for 2010-1-4, the correct answer would be 112.85.

Now, I could iterate over all the dates with a for loop and get what I want. But I would like to figure out if I could do this in a vectorized manner. Maybe by using a rolling_apply lambda function. This is what I have done so far...

def foo(x):
    today = x[0]
    forward = x[1:]
    return (forward.min())
pd.rolling_apply(data,2,foo)

This does not work since the rolling_apply works on a Series and does not have access to the other columns on the data frame.

Is this some neat way to this?

Rather than calling rolling_apply on the whole dataframe, just call it on the column of interest and call min :

pd.rolling_apply(data['Low'],2,min)

Interestingly the global min function outperforms the numpy min , perhaps not that surprising given that all we are doing is finding the lowest value of a 2 element array:

In [26]:

%timeit pd.rolling_apply(data['Low'],2,np.min)
%timeit pd.rolling_apply(data['Low'],2,min)
10 loops, best of 3: 15.4 ms per loop
1000 loops, best of 3: 1.44 ms per loop

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM