[英]Using rolling_apply with a function that requires 2 arguments in Pandas
I'm trying to use rollapply with a formula that requires 2 arguments. 我正在尝试使用rollapply与需要2个参数的公式。 To my knowledge the only way (unless you create the formula from scratch) to calculate kendall tau correlation, with standard tie correction included is:
据我所知,唯一的方法(除非你从头开始创建公式)来计算kendall tau相关性,包括标准的平局修正是:
>>> import scipy
>>> x = [5.05, 6.75, 3.21, 2.66]
>>> y = [1.65, 26.5, -5.93, 7.96]
>>> z = [1.65, 2.64, 2.64, 6.95]
>>> print scipy.stats.stats.kendalltau(x, y)[0]
0.333333333333
I'm also aware of the problem with rollapply and taking two arguments, as documented here: 我也知道rollapply的问题并采取两个参数,如下所述:
Still, I'm struggling to find a way to do the kendalltau calculation on a dataframe with multiple columns on a rolling basis. 尽管如此,我仍在努力寻找一种方法来对具有多列的数据帧进行kendalltau计算。
My dataframe is something like this 我的数据框是这样的
A = pd.DataFrame([[1, 5, 1], [2, 4, 1], [3, 3, 1], [4, 2, 1], [5, 1, 1]],
columns=['A', 'B', 'C'], index = [1, 2, 3, 4, 5])
Trying to create a function that does this 试图创建一个这样做的功能
In [1]:function(A, 3) # A is df, 3 is the rolling window
Out[2]:
A B C AB AC BC
1 1 5 2 NaN NaN NaN
2 2 4 4 NaN NaN NaN
3 3 3 1 -0.99 -0.33 0.33
4 4 2 2 -0.99 -0.33 0.33
5 5 1 4 -0.99 0.99 -0.99
In a very preliminary approach I entertained the idea of defining the function like this: 在一个非常初步的方法中,我接受了定义这样的函数的想法:
def tau1(x):
y = np.array(A['A']) # keep one column fix and run it in the other two
tau, p_value = sp.stats.kendalltau(x, y)
return tau
A['AB'] = pd.rolling_apply(A['B'], 3, lambda x: tau1(x))
Off course It didn't work. 当然它没有用。 I got:
我有:
ValueError: all keys need to be the same shape
I understand is not a trivial problem. 我明白这不是一个小问题。 I appreciate any input.
我很感激任何意见。
As of Pandas 0.14 , rolling_apply
only passes NumPy arrays to the function. 从Pandas 0.14开始 ,
rolling_apply
只将NumPy数组传递给函数。 A possible workaround is to pass np.arange(len(A))
as the first argument to rolling_apply
, so that the tau
function receives the index of the rows you wish to use. 一种可能的解决方法是将
np.arange(len(A))
作为rolling_apply
的第一个参数rolling_apply
,以便tau
函数接收您希望使用的行的索引 。 Then within the tau
function, 然后在
tau
函数内,
B = A[[col1, col2]].iloc[idx]
returns a DataFrame containing all the rows required. 返回包含所有必需行的DataFrame。
import numpy as np
import pandas as pd
import scipy.stats as stats
import itertools as IT
A = pd.DataFrame([[1, 5, 2], [2, 4, 4], [3, 3, 1], [4, 2, 2], [5, 1, 4]],
columns=['A', 'B', 'C'], index = [1, 2, 3, 4, 5])
for col1, col2 in IT.combinations(A.columns, 2):
def tau(idx):
B = A[[col1, col2]].iloc[idx]
return stats.kendalltau(B[col1], B[col2])[0]
A[col1+col2] = pd.rolling_apply(np.arange(len(A)), 3, tau)
print(A)
yields 产量
A B C AB AC BC
1 1 5 2 NaN NaN NaN
2 2 4 4 NaN NaN NaN
3 3 3 1 -1 -0.333333 0.333333
4 4 2 2 -1 -0.333333 0.333333
5 5 1 4 -1 1.000000 -1.000000
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.