[英]Pearsonr: TypeError: No loop matching the specified signature and casting was found for ufunc add
I have a timeseries Pandas dataframe called "df".我有一个名为“df”的时间序列 Pandas dataframe。 It has one column and the following shape: (2000, 1).
它具有一列和以下形状:(2000, 1)。 The head of the dataframe, below, shows its structure:
下面 dataframe 的头部显示了它的结构:
Weight
Date
2004-06-01 1.9219
2004-06-02 1.8438
2004-06-03 1.8672
2004-06-04 1.7422
2004-06-07 1.8203
Goal目标
I am trying to use a "for-loop" to calculate the correlation between the percentage change of the "Weight" variable over various timeframes or timelags.我正在尝试使用“for 循环”来计算“权重”变量在不同时间范围或时间滞后的百分比变化之间的相关性。 This is being done to evaluate the impact of holding livestock over time periods of various lengths.
这样做是为了评估在不同时间段内饲养牲畜的影响。
The loop can be found below:循环可以在下面找到:
from scipy.stats.stats import pearsonr
# Loop for producing combinations of different timelags and holddays
# and calculating the pearsonr correlation and p-value of each combination
for timelags in [1, 5, 10, 25, 60, 120, 250]:
for holddays in [1, 5, 10, 25, 60, 120, 250]:
weight_change_lagged = df.pct_change(periods=timelags)
weight_change_future = df.shift(-holddays).pct_change(periods=holddays)
if (timelags >= holddays):
indepSet=range(0, weight_change_lagged.shape[0], holddays)
else:
indepSet=range(0, weight_change_lagged.shape[0], timelags)
weight_change_lagged = weight_change_lagged.iloc[indepSet]
weight_change_future = weight_change_future.iloc[indepSet]
not_na = (weight_change_lagged.notna() & weight_change_future.notna()).values
(correlation, p-value)=pearsonr(weight_change_lagged[not_na], weight_change_future[not_na])
print('%4i %4i %7.4f %7.4f' % (timelags, holddays, correlation, p-value))
The loop executes well, however, it fails when it comes to calculating the pearsonr correlation and p-value, ie at this section:循环执行良好,但是,在计算 pearsonr 相关性和 p 值时失败,即在本节中:
(correlation, p-value)=pearsonr(weight_change_lagged[not_na], weight_change_future[not_na])
It generates this error:它生成此错误:
TypeError: no loop matching the specified signature and casting was found for ufunc add
TypeError:没有为 ufunc add 找到匹配指定签名和转换的循环
Does anyone have any clues on how to fix my problem?有没有人知道如何解决我的问题? I looked through the forums and found no answers that fit my exact requirements.
我浏览了论坛,没有找到符合我确切要求的答案。
Through random tinkering, I managed to solve my problem as follows:通过随机修补,我设法解决了我的问题,如下所示:
scipy's pearsonr package only accepts arrays or array-like inputs. scipy 的 pearsonr package 仅接受 arrays 或类似数组的输入。 This means that:
这意味着:
However, complete Pandas Dataframes of the variables, even if they contain one column, do not work.但是,完整的 Pandas 变量数据帧,即使它们包含一列,也不起作用。
So, I edited the problematic segment of the code as follows:因此,我将有问题的代码段编辑如下:
# Define an object containing observations that are not NA
not_na = (weight_change_lagged.notna() & weight_change_future.notna()).values
# Remove na values before inputting the data into the peasonr function (not within the function as I had done):
weight_change_lagged = weight_change_lagged[not_na]
weight_change_future = weight_change_future[not_na]
# Input Pandas Series of the Future and Lagged Variables into the function
(correlation, p-value)=pearsonr(weight_change_lagged['Weight'], weight_change_future['Weight'])
With just that slight modification, the code executes without hitches.只需稍作修改,代码就可以顺利执行。
Note:笔记:
If you use double square brackets, as follows, you are inputting a pandas dataframe not a series, and the pearsonr function will throw an error:如果使用双方括号,如下所示,您输入的是 pandas dataframe 不是系列,并且 pearsonr function 将抛出错误:
weight_change_future[['Weight']]
Thanks to everyone who tried to help, you questions led me to the answer.感谢所有试图提供帮助的人,您的问题使我得到了答案。
In my case, it wasn't a data type issue, instead it was because of wrong dimension.就我而言,这不是数据类型问题,而是因为维度错误。 Thanks to the article https://programmersought.com/article/67803965109/
感谢文章https://programmersought.com/article/67803965109/
You may face this error even you input numpy arrays in the function. It turns out that the "extra" dimension numpy array introduces causes this problem.即使您在 function 中输入 numpy arrays 也可能会遇到此错误。原来是引入了“额外”维度 numpy 数组导致此问题。 W
W
np_data.shape
>> (391, 1)
This (.., 1 ) is the root of the problem.这 (.., 1 ) 是问题的根源。 You can remove this dimension by using np.squeeze(np_data) to exctract only the values of the array since
您可以使用 np.squeeze(np_data) 删除此维度以仅提取数组的值,因为
np.squeeze(np_data).shape
>> (391,)
To conclude the solution would be to use:总而言之,解决方案是使用:
pearson, pvalue = pearsonr(np.squeeze(np_data_a), np.squeeze(np_data_b))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.