[英]Pandas - Using `.rolling()` on multiple columns
Consider a pandas DataFrame
which looks like the one below 考虑一个看起来像下面的熊猫
DataFrame
A B C
0 0.63 1.12 1.73
1 2.20 -2.16 -0.13
2 0.97 -0.68 1.09
3 -0.78 -1.22 0.96
4 -0.06 -0.02 2.18
I would like to use the function .rolling()
to perform the following calculation for t = 0,1,2
: 我想使用函数
.rolling()
对t = 0,1,2
执行以下计算:
t
to t+2
t
到t+2
的行 S
S
S
(or other summary statistics about S
) S
的第75个百分点(或有关S
其他摘要统计信息) For instance, for t = 1
we have S = { 2.2 , -2.16, -0.13, 0.97, -0.68, 1.09, -0.78, -1.22, 0.96 } and the 75th percentile is 0.97. 例如,对于
t = 1
我们有S = {2.2,-2.16,-0.13,0.97,-0.68,1.09,-0.78,-1.22,0.96},而第75个百分位数是0.97。
I couldn't find a way to make it work with .rolling()
, since it apparently takes each column separately. 我找不到使它与
.rolling()
一起工作的方法,因为它显然将每一列分开。 I'm now relying on a for loop, but it is really slow. 我现在依靠for循环,但这确实很慢。
Do you have any suggestion for a more efficient approach? 您对更有效的方法有何建议?
One solution is to stack
the data and then multiply your window size by the number of columns and slice the result by the number of columns. 一种解决方案是
stack
数据,然后将窗口大小乘以列数,然后将结果乘以列数。 Also, since you want a forward looking window, reverse the order of the stacked DataFrame
另外,由于要使用前向窗口,因此请反转堆叠的
DataFrame
的顺序
wsize = 3
cols = len(df.columns)
df.stack(dropna=False)[::-1].rolling(window=wsize*cols).quantile(0.75)[cols-1::cols].reset_index(-1, drop=True).sort_index()
Output: 输出:
0 1.12
1 0.97
2 0.97
3 NaN
4 NaN
dtype: float64
In the case of many columns and a small window: 对于许多列和一个小窗口:
import pandas as pd
import numpy as np
wsize = 3
df2 = pd.concat([df.shift(-x) for x in range(wsize)], 1)
s_quant = df2.quantile(0.75, 1)
# Only necessary if you need to enforce sufficient data.
s_quant[df2.isnull().any(1)] = np.NaN
Output: s_quant
输出:
s_quant
0 1.12
1 0.97
2 0.97
3 NaN
4 NaN
Name: 0.75, dtype: float64
You can use numpy ravel. 您可以使用numpy ravel。 Still you may have to use for loops.
仍然可能需要使用for循环。
for i in range(0,3):
print(df.iloc[i:i+3].values.ravel())
If your t
steps in 3s, you can use numpy reshape
function to create a n*9
dataframe. 如果
t
步长为3s,则可以使用numpy reshape
函数创建n*9
数据帧。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.