简体   繁体   English

矩阵的每列与向量之间的numpy协方差

[英]numpy covariance between each column of a matrix and a vector

Based on this post , I can get covariance between two vectors using np.cov((x,y), rowvar=0) . 基于这篇文章 ,我可以使用np.cov((x,y), rowvar=0)得到两个向量之间的协方差。 I have a matrix MxN and a vector Mx1. 我有一个矩阵MxN和一个向量Mx1。 I want to find the covariance between each column of the matrix and the given vector. 我想找到矩阵的每列与给定向量之间的协方差。 I know that I can use for loop to write. 我知道我可以使用for循环来编写。 I was wondering if I can somehow use np.cov() to get the result directly. 我想知道我是否可以某种方式使用np.cov()来直接获得结果。

As Warren Weckesser said, the numpy.cov(X, Y) is a poor fit for the job because it will simply join the arrays in one M by (N+1) array and find the huge (N+1) by (N+1) covariance matrix. 正如Warren Weckesser所说, numpy.cov(X, Y)不适合这项工作,因为它只是将数组加入一个M乘(N + 1)数组并找到巨大的(N + 1)(N) +1)协方差矩阵。 But we'll always have the definition of covariance and it's easy to use: 但是我们总是会有协方差定义,并且它易于使用:

A = np.sqrt(np.arange(12).reshape(3, 4))   # some 3 by 4 array 
b = np.array([[2], [4], [5]])              # some 3 by 1 vector
cov = np.dot(b.T - b.mean(), A - A.mean(axis=0)) / (b.shape[0]-1)

This returns the covariances of each column of A with b. 这将返回A的每列与b的协方差。

array([[ 2.21895142,  1.53934466,  1.3379221 ,  1.20866607]])

The formula I used is for sample covariance (which is what numpy.cov computes, too), hence the division by (b.shape[0] - 1). 我使用的公式是样本协方差(这也是numpy.cov计算的),因此除以(b.shape [0] -1)。 If you divide by b.shape[0] you get the unadjusted population covariance . 如果除以b.shape[0] ,则得到未经调整的种群协方差

For comparison, the same computation using np.cov : 为了比较,使用np.cov进行相同的计算:

import numpy as np
A = np.sqrt(np.arange(12).reshape(3, 4))
b = np.array([[2], [4], [5]])
np.cov(A, b, rowvar=False)[-1, :-1]

Same output, but it takes about twice this long (and for large matrices, the difference will be much larger). 相同的输出,但它需要大约两倍的长度(对于大型矩阵,差异将大得多)。 The slicing at the end is because np.cov computes a 5 by 5 matrix, in which only the first 4 entries of the last row are what you wanted. 最后的切片是因为np.cov计算一个5乘5的矩阵,其中只有最后一行的前4个条目是你想要的。 The rest is covariance of A with itself, or of b with itself. 其余的是A与其自身的协方差,或b与其自身的协方差。

Correlation coefficient 相关系数

The correlation coefficientis obtained by dividing by square roots of variances. 相关系数是通过除以方差的平方根得到的。 Watch out for that -1 adjustment mentioned earlier: numpy.var does not make it by default, to make it happen you need ddof=1 parameter. 注意前面提到的-1调整: numpy.var默认情况下不numpy.var ,为了实现它你需要ddof=1参数。

corr = cov / np.sqrt(np.var(b, ddof=1) * np.var(A, axis=0, ddof=1)) 

Check that the output is the same as the less efficient version 检查输出是否与效率较低的版本相同

np.corrcoef(A, b, rowvar=False)[-1, :-1]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM