I have a dataframe like this:
date A
2015.1.1 10
2015.1.2 20
2015.1.3 30
2015.1.4 40
2015.1.5 50
2015.1.6 60
I need to caculate the std of top N rows, such as:
date A std
2015.1.1 10 std(10)
2015.1.2 20 std(10,20)
2015.1.3 30 std(10,20,30)
2015.1.4 40 std(10,20,30,40)
2015.1.5 50 std(10,20,30,40,50)
2015.1.6 60 std(10,20,30,40,50,60)
pd.rolling_std is used to do this, however, how to change N dynamically?
df[['A']].apply(lambda x:pd.rolling_std(x,N))
<class 'pandas.core.frame.DataFrame'>
Index: 75 entries, 2015-04-16 to 2015-07-31
Data columns (total 4 columns):
A 75 non-null float64
dtypes: float64(4)
memory usage: 2.9+ KB
It could be done by calling apply
on the df like so:
In [29]:
def func(x):
return df.iloc[:x.name + 1][x.index].std()
df['std'] = df[['A']].apply(func, axis=1)
df
Out[29]:
date A std
0 2015.1.1 10 NaN
1 2015.1.2 20 7.071068
2 2015.1.3 30 10.000000
3 2015.1.4 40 12.909944
4 2015.1.5 50 15.811388
5 2015.1.6 60 18.708287
This uses double subscripts [[]]
to call apply
on a df with a single column, this allows you to pass param axis=1
so you can call you function row-wise, you then have access to the index attribute, which is name
and the column name attribute, which is index
, this allows you to slice your df to calculate a rolling std
.
You can add a window arg to func
to modify the window as desired
EDIT
It looks like your index is a str, the following should work:
In [39]:
def func(x):
return df.ix[:x.name ][x.index].std()
df['std'] = df[['A']].apply(lambda x: func(x), axis=1)
df
Out[39]:
A std
date
2015.1.1 10 NaN
2015.1.2 20 7.071068
2015.1.3 30 10.000000
2015.1.4 40 12.909944
2015.1.5 50 15.811388
2015.1.6 60 18.708287
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.