简体   繁体   中英

In python, How do we find the Correlation Coefficient between two matrices?

I have got two matrices say, T1 and T2 each of size mxn. I want to find the correlation coefficient between two matrices
So far I haven't used any built-in library function for it. I am doing the following steps for it:
First I calculate the mean of the two matrices as:

M1 = T1.mean()
M2 = T2.mean()

and then I subtract the mean from the corresponding matrices as:

A = np.subtract(T1, M1)
B = np.subtract(T2, M2)

where np is the numpy library and A and B are the resulting matrices after doing the subtraction.
Now, I calculate the correlation coefficent as:

alpha = np.sum(A*B) / (np.sqrt((np.sum(A))*np.sum(B)))

However, the value i get is far greater than 1 and in not meaningful at all. It should be in between 0 and 1 to get some meaning out of it.
I have also tried to make use absolute values of matrix A and B, but that also did'nt work.
I also tried to use:

np.sum(np.dot(A,B.T)) instead of np.sum(A*B)  

in the numerator, but that also didn't work.
Edit1:
This is the formula that I intend to calculate:
此图显示了要计算的实际公式

In this image, C is one of the matrices and T is another one.
'u' is the mean symbol.

Can somebody tell me where actually i am doing the mistake.

Can you try this:

import numpy as np
x = np.array([[0.1, .32, .2, 0.4, 0.8], [.23, .18, .56, .61, .12]])
y = np.array([[2,4,0.1, .32, .2],[1,3,.23, .18, .56]])
pearson = np.corrcoef(x,y)
print(pearson)

Well I think this function is doing what I intend for:

def correlation_coefficient(T1, T2):
    numerator = np.mean((T1 - T1.mean()) * (T2 - T2.mean()))
    denominator = T1.std() * T2.std()
    if denominator == 0:
        return 0
    else:
        result = numerator / denominator
        return result

The calculation of numerator seems to be tricky here which doesn't exactly reflect the formula shown in the above image and denominator is just the product of standard deviations of the two images.
However, the result does make a sense now as the result lies only in between 0 and 1.

From the way the problem is described in the OP, the matrices are treated as arrays, so one could simply flatten them:

x = T1.flatten()
y = T2.flatten()

One could then use either the builtin numpy function proposed by @AakashMakwana:

import numy as np
r = np.corrcoef(x, y)[0,1]

Remark: Note that without flattening this solution would produce the matrix of pairwise correlations.

Alternatively, one could use the equivalent scipy function :

from scipy.stats import pearsonr
r = pearsonr(x,y)[0]

Scipy additionally provides possibility of calculating Spearman correlation coefficient ( spearmanr(x,y)[0] ) or Kendall tau ( kendalltau(x,y)[0] ).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM