简体   繁体   中英

Correlation coefficient for three variables in r

For three n-dimensional non-zero-variance variables a, b, and c, n > 2, if r(ab), r(bc), and r(ac) are Pearson's correlation coefficients between a and b, between b and c, and between a and c, respectively, then correlation coefficient r(abc) among a, b, and c is defined as:

r^2(abc) = ( r^2(ab) + r^2(bc) + r^2(ac) ) - ( 2 xr(ab) xr(bc) xr(ac) )

I was able to get the code for a manual way of doing it:

a <- c(4, 6, 2, 7)
b <- c(8, 1, 3, 5)
c <- c(6, 3, 1, 9)

al <- data.frame(a, b, c)
al


ab_cor <- cor(al$a, al$b, method = c("pearson"))
bc_cor <- cor(al$b, al$c, method = c("pearson"))
ac_cor <- cor(al$a, al$c, method = c("pearson"))

abc_cor <- sqrt( ( (ab_cor)^2 + (bc_cor)^2 + (ac_cor)^2 ) - ( 2 * ab_cor * bc_cor * ac_cor) )
abc_cor

But I was wondering if this could be done with less lines of code, for example with a for loop. Addittionaly, how would I write it so that I could do it with more than 3 variables as well, for example, r(abcd) ie r(ab), r(ac), r(ad), r(bc), r(bd), and r(cd).

The cor function already creates a matrix of the correlations. You just need to pick out the relevant ones and then use some vector operations.

cs <- cor(al, method = "pearson")

cs <- cs[upper.tri(cs)]

#sqrt(sum(cs^2)) - 2*prod(cs)
# apparently it's
sqrt(sum(cs^2) - 2*prod(cs))

This generalizes to your larger case as well assuming that you have all the variables you want in your al data.frame.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM