[英]how do i calculate correlation between corresponding columns of two matrices and not getting other correlations as output
I have these data我有这些数据
> a
a b c
1 1 -1 4
2 2 -2 6
3 3 -3 9
4 4 -4 12
5 5 -5 6
> b
d e f
1 6 -5 7
2 7 -4 4
3 8 -3 3
4 9 -2 3
5 10 -1 9
> cor(a,b)
d e f
a 1.0000000 1.0000000 0.1767767
b -1.0000000 -1.000000 -0.1767767
c 0.5050763 0.5050763 -0.6964286
The result I want is just:我想要的结果只是:
cor(a,d) = 1
cor(b,e) = -1
cor(c,f) = -0.6964286
The first answer above calculates all pairwise correlations, which is fine unless the matrices are large, and the second one doesn't work.上面的第一个答案计算所有成对相关性,除非矩阵很大,否则这很好,而第二个答案不起作用。 As far as I can tell, efficient computation must be done directly, such as this code borrowed from borrowed from the arrayMagic Bioconductor package, works efficiently for large matrices:
据我所知,必须直接进行高效计算,例如从 arrayMagic Bioconductor package 借来的代码,对于大型矩阵有效:
> colCors = function(x, y) {
+ sqr = function(x) x*x
+ if(!is.matrix(x)||!is.matrix(y)||any(dim(x)!=dim(y)))
+ stop("Please supply two matrices of equal size.")
+ x = sweep(x, 2, colMeans(x))
+ y = sweep(y, 2, colMeans(y))
+ cor = colSums(x*y) / sqrt(colSums(sqr(x))*colSums(sqr(y)))
+ return(cor)
+ }
> set.seed(1)
> a=matrix(rnorm(15),nrow=5)
> b=matrix(rnorm(15),nrow=5)
> diag(cor(a,b))
[1] 0.2491625 -0.5313192 0.5594564
> mapply(cor,a,b)
[1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
> colCors(a,b)
[1] 0.2491625 -0.5313192 0.5594564
I would probably personally just use diag
:我个人可能只会使用
diag
:
> diag(cor(a,b))
[1] 1.0000000 -1.0000000 -0.6964286
But you could also use mapply
:但你也可以使用
mapply
:
> mapply(cor,a,b)
a b c
1.0000000 -1.0000000 -0.6964286
mapply
works with data frames but not matrices. mapply
适用于数据框,但不适用于矩阵。 That is because in data frames each column is an element, while in matrices each entry is an element.这是因为在数据帧中,每一列都是一个元素,而在矩阵中,每个条目都是一个元素。
In the answer above mapply(cor,as.data.frame(a),as.data.frame(b))
works just fine.在上面的答案中,
mapply(cor,as.data.frame(a),as.data.frame(b))
工作得很好。
set.seed(1)
a=matrix(rnorm(15),nrow=5)
b=matrix(rnorm(15),nrow=5)
diag(cor(a,b))
[1] 0.2491625 -0.5313192 0.5594564
mapply(cor,as.data.frame(a),as.data.frame(b))
V1 V2 V3
0.2491625 -0.5313192 0.5594564
This is much more efficient for large matrices.这对于大型矩阵更有效。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.