pandas - calculating rolling variance in rank order

Question

I have the following two dataframes:

          1         2         3         4         5         6
0  0.022135  0.007161  0.002604  0.009847  0.004476  0.003255
1  0.011515  0.000529  0.009481  0.003215  0.002157  0.003621
2  0.011556  0.000326  0.009440  0.003255  0.002116  0.003581
3  0.011556  0.000326  0.009440  0.003255  0.002116  0.003581
4  0.011556  0.000326  0.009440  0.003255  0.002116  0.003581
5  0.011556  0.000326  0.009196  0.003255  0.002360  0.003581
6  0.011353  0.000366  0.009155  0.003296  0.002319  0.003540
7  0.011353  0.000610  0.009155  0.003296  0.002563  0.003540
8  0.011312  0.000570  0.008952  0.003255  0.002604  0.003581
9  0.011312  0.000570  0.008952  0.003255  0.002604  0.003581

         1  2  3  4  5  6
level_0                  
0        3  6  5  2  4  1
1        2  5  4  6  3  1
2        2  5  4  6  3  1
3        2  5  4  6  3  1
4        2  5  4  6  3  1
5        2  5  4  6  3  1
6        2  5  4  6  3  1
7        2  5  4  6  3  1
8        2  5  4  6  3  1
9        2  5  4  6  3  1

I would like to get the rolling variance across each row in the first dataframe in the order specified in the 2nd dataframe . This rolling variance needs to go into a new column in the first dataframe, where I can associate it with the original column value.

For example, the first row in the 2nd dataframe is [3, 6, 5, 2, 4, 1].

The first row in the 1st dataframe is [0.022135, 0.007161, 0.002604, 0.009847, 0.004476, 0.003255]

The rolling variance is, therefore:

var([0.002604]), in column 3
var([0.002604, 0.003255]), in column 6

et cetera.

Further, I need to know the number of values used in this rolling variance.

So the first row of my result will be:

(var[0.002604], 1) in column 3
(var[0.002604, 0.003255], 2) in column 6

et cetera

What is a quick way to do this, ideally without the use of apply() ? My suspicion is that this is impossible.

Answer 1

You can convert the 2nd dataframe with rank information into an ndarray of column indexing and then use one of NumPy's indexing tools to transform the original dataframe based on this indexing array. Below is an example using numpy.take() to do the transformation.

convert the 2nd dataframe from ranking to indexing (from 1-based to 0-based):
```
 df_rank = df_rank - 1 
```

reconstruct the dataframe with np.take() :

 df_new = pd.DataFrame([ np.take(df.values[i,:], df_rank.values[i,:]) for i in range(df.shape[0]) ], columns = df.columns) #In [96]: df_new #Out[96]: # 1 2 3 4 5 6 #0 0.002604 0.003255 0.004476 0.007161 0.009847 0.022135 #1 0.000529 0.002157 0.003215 0.003621 0.009481 0.011515 #2 0.000326 0.002116 0.003255 0.003581 0.009440 0.011556 #3 0.000326 0.002116 0.003255 0.003581 0.009440 0.011556 #4 0.000326 0.002116 0.003255 0.003581 0.009440 0.011556 #5 0.000326 0.002360 0.003255 0.003581 0.009196 0.011556 #6 0.000366 0.002319 0.003296 0.003540 0.009155 0.011353 #7 0.000610 0.002563 0.003296 0.003540 0.009155 0.011353 #8 0.000570 0.002604 0.003255 0.003581 0.008952 0.011312 #9 0.000570 0.002604 0.003255 0.003581 0.008952 0.011312

do whatever you need on regular dataframe:
```
 df_new.expanding(1,axis=1).var(0) 
```

pandas - calculating rolling variance in rank order

Question

1 answers

solution1
0 2019-05-13 15:43:58

pandas - calculating rolling variance in rank order

Question

1 answers

solution1 0 2019-05-13 15:43:58

solution1
0 2019-05-13 15:43:58