为什么高相关数据的 p 值为 1？怎么了？

Question

我尝试使用以下矩阵的 p 值过滤相关矩阵

import numpy as np
from scipy.stats.stats import pearsonr
A=np.array([[ 6.02,  5.32],
       [12.18, 12.13],
       [11.08, 10.54],
       [ 9.03,  8.95],
       [ 6.08,  6.94]])

我使用以下代码

def get_corr(M, g=1):

    n =np.shape(M)[0]
    out = np.empty(np.shape(M)[0])
    out_p = np.empty(np.shape(M)[0])

    out1 = np.zeros(shape=(np.shape(M)[0],np.shape(M)[0]))
    P1 = np.zeros(shape=(np.shape(M)[0],np.shape(M)[0]))
    for p in range(np.shape(M)[0]):
        for i in range(np.shape(M)[0]):

            PearsonCorrCoeff, pval = pearsonr(M[p,:], M[i,:])            
            aux = PearsonCorrCoeff
            out_p[i]= pval
            out[i] = 0 if np.isnan(aux) else aux 
            if g==1:
                if pval < (0.01):#/N:
                  aux = aux
                else: 
                  aux = 0
                  out[i] = 0 if np.isnan(aux) else aux   
            else:      
                  out[i] = 0 if np.isnan(aux) else aux    
        out1[p] = out 
        P1[p] = out_p
    return out1,P1
corr_A, P_A = get_corr(A)

但是我得到的答案很奇怪，因为没有过滤的主要相关性是

corr_A=array([[ 1., -1.,  1., -1.,  1.],
       [-1.,  1., -1.,  1., -1.],
       [ 1., -1.,  1., -1.,  1.],
       [-1.,  1., -1.,  1., -1.],
       [ 1., -1.,  1., -1.,  1.]])

P值矩阵是

P_A=array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

虽然一切都应该为零，但我不知道可能是什么原因，以前有人遇到过同样的问题吗？

Answer 1

要详细说明@Marat 的评论，您可能想要：

pearsonr(M[:,p], M[:,i])

为什么 -1/1 是您在这里所期望的？ 想想x和y只是两个值的情况，想想通过这些值的图表拟合一条最佳拟合线。 就像是：

import numpy as np
import matplotlib.pyplot as plt

A = np.random.randn(2,2)

x = A[0]
y = A[1]

ax = plt.plot(x,y, "-o")
ax[0].axes.set(xlabel="x", ylabel="y")
None

所以不会太破旧！

你可能期待这样的事情：

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import pearsonr

B = np.random.randn(2,300)

x = B[0]
y = B[1]

print(pearsonr(x,y))

ax = plt.plot(x,y, "o")
ax[0].axes.set(xlabel="x", ylabel="y", title="With >two values")
None

正如预期的那样，没有太大的相关性。

为什么高相关数据的 p 值为 1？怎么了？

问题描述

1 个解决方案

解决方案1
1 2022-08-17 14:37:54

为什么高相关数据的 p 值为 1？ 怎么了？

问题描述

1 个解决方案

解决方案1 1 2022-08-17 14:37:54

为什么高相关数据的 p 值为 1？怎么了？

解决方案1
1 2022-08-17 14:37:54