简体   繁体   中英

How to create a for loop to obtain correlations from a data frame in R?

I have the following data frame:

Gene <- c("1","2","3","4","5","6")
A1.1 <- c(1,1,2,4,3,5)
B1.1 <- c(1,2,3,4,5,6)
C1.1 <- c(2,2,3,5,5,5)
A1.2 <- c(1,2,3,5,5,5)
B1.2 <- c(3,2,5,6,6,6)
C1.2 <- c(1,1,2,2,4,6)
df <- data.frame(Gene, A1.1, B1.1, C1.1, A1.2, B1.2, C1.2)

   Gene A1.1 B1.1 C1.1 A1.2 B1.2 C1.2
1    1    1    1    2    1    3    1
2    2    1    2    2    2    2    1
3    3    2    3    3    3    5    2
4    4    4    4    5    5    6    2
5    5    3    5    5    5    6    4
6    6    5    6    5    5    6    6

So I need to obtain correlation values between columns of the same letter. So obtain the correlation values for A1.1 and A1.2, B1.1 and B1.2, and C1.1 and C1.2 for a total of 3 correlation values.

I can do this by using the cor() function for each (eg. cor(df$A1.1, df$A1.2) ), but is there a for loop I could create that could obtain the correlations for all these at once?

You could use split.default :

sapply(split.default(df[-1], sub('.\\d+$', '', names(df)[-1])), 
              function(x)cor(x[[1]], x[[2]]))

       A1        B1        C1 
0.9042908 0.8546548 0.7656415 

If there are many columns with the same names:

 lapply(split.default(df[-1], sub('.\\d+$', '', names(df)[-1])), cor)
$A1
          A1.1      A1.2
A1.1 1.0000000 0.9042908
A1.2 0.9042908 1.0000000

$B1
          B1.1      B1.2
B1.1 1.0000000 0.8546548
B1.2 0.8546548 1.0000000

$C1
          C1.1      C1.2
C1.1 1.0000000 0.7656415
C1.2 0.7656415 1.0000000

If you have more columns with the same letter, correlation matrices might be more convenient to use, as you need to compare all columns with all the other columns

cor_list <- list()

col_names <- colnames(df[-1])
column_letters <- unique(substr(col_names, 1, 1))

for (let in column_letters){
  indices <- substr(colnames(df),1, 1) == let
  cor_list[[let]] <- cor(df[indices])
}

When your columns are as nicely arranged as shown, you can correlate the pairs.

sapply(0:2, function(i) cor(df[, 2 + i], df[, 5 + i]))
# [1] 0.9042908 0.8546548 0.7656415

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM