简体   繁体   中英

Calculate correlation by aggregating columns of data frame

I have the following data frame:

y <- data.frame(group = letters[1:5], a = rnorm(5) , b = rnorm(5), c = rnorm(5), d = rnorm(5) )

How to get a data frame which gives me the correlation between columns a,b and c,d for each row?

something like: sapply(y, function(x) {cor(x[2:3],x[4:5])})

Thank you, S

You could use apply

> apply(y[,-1],1,function(x) cor(x[1:2],x[3:4]))
[1] -1 -1  1 -1 1

Or ddply (although this might be overkill, and if two rows have the same group it will do the correlation of columns a&b and c&d for both those rows):

> ddply(y,.(group),function(x) cor(c(x$a,x$b),c(x$c,x$d)))
  group V1
1     a -1
2     b -1
3     c  1
4     d -1
5     e  1

You can use apply to apply a function to each row (or column) of a matrix, array or data.frame.

apply(
  y[,-1], # Remove the first column, to ensure that u remains numeric
  1,      # Apply the function on each row
  function(u) cor( u[1:2], u[3:4] )
)

(With just 2 observations, the correlation can only be +1 or -1.)

You're almost there: you just need to use apply instead of sapply , and remove unnecessary columns.

apply(y[-1], 1, function(x) cor(x[1:2], x[3:4])

Of course, the correlation between two length-2 vectors isn't very informative....

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM