简体   繁体   中英

pandas - calculating rolling variance in rank order

I have the following two dataframes:

          1         2         3         4         5         6
0  0.022135  0.007161  0.002604  0.009847  0.004476  0.003255
1  0.011515  0.000529  0.009481  0.003215  0.002157  0.003621
2  0.011556  0.000326  0.009440  0.003255  0.002116  0.003581
3  0.011556  0.000326  0.009440  0.003255  0.002116  0.003581
4  0.011556  0.000326  0.009440  0.003255  0.002116  0.003581
5  0.011556  0.000326  0.009196  0.003255  0.002360  0.003581
6  0.011353  0.000366  0.009155  0.003296  0.002319  0.003540
7  0.011353  0.000610  0.009155  0.003296  0.002563  0.003540
8  0.011312  0.000570  0.008952  0.003255  0.002604  0.003581
9  0.011312  0.000570  0.008952  0.003255  0.002604  0.003581
         1  2  3  4  5  6
level_0                  
0        3  6  5  2  4  1
1        2  5  4  6  3  1
2        2  5  4  6  3  1
3        2  5  4  6  3  1
4        2  5  4  6  3  1
5        2  5  4  6  3  1
6        2  5  4  6  3  1
7        2  5  4  6  3  1
8        2  5  4  6  3  1
9        2  5  4  6  3  1

I would like to get the rolling variance across each row in the first dataframe in the order specified in the 2nd dataframe . This rolling variance needs to go into a new column in the first dataframe, where I can associate it with the original column value.

For example, the first row in the 2nd dataframe is [3, 6, 5, 2, 4, 1].

The first row in the 1st dataframe is [0.022135, 0.007161, 0.002604, 0.009847, 0.004476, 0.003255]

The rolling variance is, therefore:

var([0.002604]), in column 3
var([0.002604, 0.003255]), in column 6

et cetera.

Further, I need to know the number of values used in this rolling variance.

So the first row of my result will be:

(var[0.002604], 1) in column 3
(var[0.002604, 0.003255], 2) in column 6

et cetera

What is a quick way to do this, ideally without the use of apply() ? My suspicion is that this is impossible.

You can convert the 2nd dataframe with rank information into an ndarray of column indexing and then use one of NumPy's indexing tools to transform the original dataframe based on this indexing array. Below is an example using numpy.take() to do the transformation.

  1. convert the 2nd dataframe from ranking to indexing (from 1-based to 0-based):

     df_rank = df_rank - 1 
  2. reconstruct the dataframe with np.take() :

     df_new = pd.DataFrame([ np.take(df.values[i,:], df_rank.values[i,:]) for i in range(df.shape[0]) ], columns = df.columns) #In [96]: df_new #Out[96]: # 1 2 3 4 5 6 #0 0.002604 0.003255 0.004476 0.007161 0.009847 0.022135 #1 0.000529 0.002157 0.003215 0.003621 0.009481 0.011515 #2 0.000326 0.002116 0.003255 0.003581 0.009440 0.011556 #3 0.000326 0.002116 0.003255 0.003581 0.009440 0.011556 #4 0.000326 0.002116 0.003255 0.003581 0.009440 0.011556 #5 0.000326 0.002360 0.003255 0.003581 0.009196 0.011556 #6 0.000366 0.002319 0.003296 0.003540 0.009155 0.011353 #7 0.000610 0.002563 0.003296 0.003540 0.009155 0.011353 #8 0.000570 0.002604 0.003255 0.003581 0.008952 0.011312 #9 0.000570 0.002604 0.003255 0.003581 0.008952 0.011312 
  3. do whatever you need on regular dataframe:

     df_new.expanding(1,axis=1).var(0) 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM