简体   繁体   中英

Pearsonr: TypeError: No loop matching the specified signature and casting was found for ufunc add

I have a timeseries Pandas dataframe called "df". It has one column and the following shape: (2000, 1). The head of the dataframe, below, shows its structure:

            Weight
Date    
2004-06-01  1.9219
2004-06-02  1.8438
2004-06-03  1.8672
2004-06-04  1.7422
2004-06-07  1.8203

Goal

I am trying to use a "for-loop" to calculate the correlation between the percentage change of the "Weight" variable over various timeframes or timelags. This is being done to evaluate the impact of holding livestock over time periods of various lengths.

The loop can be found below:

from scipy.stats.stats import pearsonr

# Loop for producing combinations of different timelags and holddays 
# and calculating the pearsonr correlation and p-value of each combination 

for timelags in [1, 5, 10, 25, 60, 120, 250]:
    for holddays in [1, 5, 10, 25, 60, 120, 250]:
        weight_change_lagged = df.pct_change(periods=timelags)
        weight_change_future = df.shift(-holddays).pct_change(periods=holddays)

        if (timelags >= holddays):
            indepSet=range(0, weight_change_lagged.shape[0], holddays)
        else:
            indepSet=range(0, weight_change_lagged.shape[0], timelags)

        weight_change_lagged = weight_change_lagged.iloc[indepSet]
        weight_change_future = weight_change_future.iloc[indepSet]

        not_na = (weight_change_lagged.notna() & weight_change_future.notna()).values

        (correlation, p-value)=pearsonr(weight_change_lagged[not_na], weight_change_future[not_na])
        print('%4i %4i %7.4f %7.4f' % (timelags, holddays, correlation, p-value))

The loop executes well, however, it fails when it comes to calculating the pearsonr correlation and p-value, ie at this section:

(correlation, p-value)=pearsonr(weight_change_lagged[not_na], weight_change_future[not_na])

It generates this error:

TypeError: no loop matching the specified signature and casting was found for ufunc add

Does anyone have any clues on how to fix my problem? I looked through the forums and found no answers that fit my exact requirements.

Through random tinkering, I managed to solve my problem as follows:

scipy's pearsonr package only accepts arrays or array-like inputs. This means that:

  • Numpy arrays of input variables work.
  • Pandas Series of the input variables work.

However, complete Pandas Dataframes of the variables, even if they contain one column, do not work.

So, I edited the problematic segment of the code as follows:

# Define an object containing observations that are not NA
not_na = (weight_change_lagged.notna() & weight_change_future.notna()).values

# Remove na values before inputting the data into the peasonr function (not within the function as I had done):
weight_change_lagged = weight_change_lagged[not_na]
weight_change_future = weight_change_future[not_na]

# Input Pandas Series of the Future and Lagged Variables into the function
(correlation, p-value)=pearsonr(weight_change_lagged['Weight'], weight_change_future['Weight'])

With just that slight modification, the code executes without hitches.

Note:

If you use double square brackets, as follows, you are inputting a pandas dataframe not a series, and the pearsonr function will throw an error:

weight_change_future[['Weight']]

Thanks to everyone who tried to help, you questions led me to the answer.

In my case, it wasn't a data type issue, instead it was because of wrong dimension. Thanks to the article https://programmersought.com/article/67803965109/

You may face this error even you input numpy arrays in the function. It turns out that the "extra" dimension numpy array introduces causes this problem. W

np_data.shape
>> (391, 1)

This (.., 1 ) is the root of the problem. You can remove this dimension by using np.squeeze(np_data) to exctract only the values of the array since

np.squeeze(np_data).shape
>> (391,)

To conclude the solution would be to use:

pearson, pvalue = pearsonr(np.squeeze(np_data_a), np.squeeze(np_data_b))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM