简体   繁体   English

熊猫-在多列上使用`.rolling()`

[英]Pandas - Using `.rolling()` on multiple columns

Consider a pandas DataFrame which looks like the one below 考虑一个看起来像下面的熊猫DataFrame

      A     B     C
0  0.63  1.12  1.73
1  2.20 -2.16 -0.13
2  0.97 -0.68  1.09
3 -0.78 -1.22  0.96
4 -0.06 -0.02  2.18

I would like to use the function .rolling() to perform the following calculation for t = 0,1,2 : 我想使用函数.rolling()t = 0,1,2执行以下计算:

  • Select the rows from t to t+2 选择从tt+2的行
  • Take the 9 values contained in those 3 rows, from all the columns. 从所有列中获取这3行中包含的9个值。 Call this set S 称这套S
  • Compute the 75th percentile of S (or other summary statistics about S ) 计算S的第75个百分点(或有关S其他摘要统计信息)


For instance, for t = 1 we have S = { 2.2 , -2.16, -0.13, 0.97, -0.68, 1.09, -0.78, -1.22, 0.96 } and the 75th percentile is 0.97. 例如,对于t = 1我们有S = {2.2,-2.16,-0.13,0.97,-0.68,1.09,-0.78,-1.22,0.96},而第75个百分位数是0.97。

I couldn't find a way to make it work with .rolling() , since it apparently takes each column separately. 我找不到使它与.rolling()一起工作的方法,因为它显然将每一列分开。 I'm now relying on a for loop, but it is really slow. 我现在依靠for循环,但这确实很慢。

Do you have any suggestion for a more efficient approach? 您对更有效的方法有何建议?

One solution is to stack the data and then multiply your window size by the number of columns and slice the result by the number of columns. 一种解决方案是stack数据,然后将窗口大小乘以列数,然后将结果乘以列数。 Also, since you want a forward looking window, reverse the order of the stacked DataFrame 另外,由于要使用前向窗口,因此请反转堆叠的DataFrame的顺序

wsize = 3
cols = len(df.columns)

df.stack(dropna=False)[::-1].rolling(window=wsize*cols).quantile(0.75)[cols-1::cols].reset_index(-1, drop=True).sort_index()

Output: 输出:

0    1.12
1    0.97
2    0.97
3     NaN
4     NaN
dtype: float64

In the case of many columns and a small window: 对于许多列和一个小窗口:

import pandas as pd
import numpy as np

wsize = 3
df2 = pd.concat([df.shift(-x) for x in range(wsize)], 1)
s_quant = df2.quantile(0.75, 1)

# Only necessary if you need to enforce sufficient data. 
s_quant[df2.isnull().any(1)] = np.NaN

Output: s_quant 输出: s_quant

0    1.12
1    0.97
2    0.97
3     NaN
4     NaN
Name: 0.75, dtype: float64

You can use numpy ravel. 您可以使用numpy ravel。 Still you may have to use for loops. 仍然可能需要使用for循环。

for i in range(0,3):
    print(df.iloc[i:i+3].values.ravel())

If your t steps in 3s, you can use numpy reshape function to create a n*9 dataframe. 如果t步长为3s,则可以使用numpy reshape函数创建n*9数据帧。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM