矩阵的每列与向量之间的numpy协方差

Question

Based on this post , I can get covariance between two vectors using np.cov((x,y), rowvar=0) . 基于这篇文章，我可以使用np.cov((x,y), rowvar=0)得到两个向量之间的协方差。 I have a matrix MxN and a vector Mx1. 我有一个矩阵MxN和一个向量Mx1。 I want to find the covariance between each column of the matrix and the given vector. 我想找到矩阵的每列与给定向量之间的协方差。 I know that I can use for loop to write. 我知道我可以使用for循环来编写。 I was wondering if I can somehow use np.cov() to get the result directly. 我想知道我是否可以某种方式使用np.cov()来直接获得结果。

Answer 1

As Warren Weckesser said, the numpy.cov(X, Y) is a poor fit for the job because it will simply join the arrays in one M by (N+1) array and find the huge (N+1) by (N+1) covariance matrix. 正如Warren Weckesser所说， numpy.cov(X, Y)不适合这项工作，因为它只是将数组加入一个M乘（N + 1）数组并找到巨大的（N + 1）（N） +1）协方差矩阵。 But we'll always have the definition of covariance and it's easy to use: 但是我们总是会有协方差的定义，并且它易于使用：

A = np.sqrt(np.arange(12).reshape(3, 4))   # some 3 by 4 array 
b = np.array([[2], [4], [5]])              # some 3 by 1 vector
cov = np.dot(b.T - b.mean(), A - A.mean(axis=0)) / (b.shape[0]-1)

This returns the covariances of each column of A with b. 这将返回A的每列与b的协方差。

array([[ 2.21895142,  1.53934466,  1.3379221 ,  1.20866607]])

The formula I used is for sample covariance (which is what numpy.cov computes, too), hence the division by (b.shape[0] - 1). 我使用的公式是样本协方差（这也是numpy.cov计算的），因此除以（b.shape [0] -1）。 If you divide by b.shape[0] you get the unadjusted population covariance . 如果除以b.shape[0] ，则得到未经调整的种群协方差。

For comparison, the same computation using np.cov : 为了比较，使用np.cov进行相同的计算：

import numpy as np
A = np.sqrt(np.arange(12).reshape(3, 4))
b = np.array([[2], [4], [5]])
np.cov(A, b, rowvar=False)[-1, :-1]

Same output, but it takes about twice this long (and for large matrices, the difference will be much larger). 相同的输出，但它需要大约两倍的长度（对于大型矩阵，差异将大得多）。 The slicing at the end is because np.cov computes a 5 by 5 matrix, in which only the first 4 entries of the last row are what you wanted. 最后的切片是因为np.cov计算一个5乘5的矩阵，其中只有最后一行的前4个条目是你想要的。 The rest is covariance of A with itself, or of b with itself. 其余的是A与其自身的协方差，或b与其自身的协方差。

Correlation coefficient 相关系数

The correlation coefficientis obtained by dividing by square roots of variances. 相关系数是通过除以方差的平方根得到的。 Watch out for that -1 adjustment mentioned earlier: numpy.var does not make it by default, to make it happen you need ddof=1 parameter. 注意前面提到的-1调整： numpy.var默认情况下不numpy.var ，为了实现它你需要ddof=1参数。

corr = cov / np.sqrt(np.var(b, ddof=1) * np.var(A, axis=0, ddof=1))

Check that the output is the same as the less efficient version 检查输出是否与效率较低的版本相同

np.corrcoef(A, b, rowvar=False)[-1, :-1]

矩阵的每列与向量之间的numpy协方差

问题描述

1 个解决方案

解决方案1
3 已采纳

Correlation coefficient 相关系数

矩阵的每列与向量之间的numpy协方差

问题描述

1 个解决方案

解决方案1 3 已采纳

Correlation coefficient 相关系数

解决方案1
3 已采纳