简体   繁体   中英

How to calculate correlations between certain columns in a list of dataframes?

I need to generate 20 different samples (n=100) from standard normal distribution (X0, X1, X2, ..., X19) and calculate the correlation between X0 and all the other samples X2...X19. I know how to do this for one "whole sample" (X0...X19) but I should do this for several samples of X0...X19 simultaneously. I tried generating a list of dataframes (each dataframe containing one sample of X0...X19) and iterate through it but it failed for some reason.

My data looks like this:

dataframes <- replicate(10, as.data.frame(replicate(20, rnorm(100))))

head(dataframes)

#   [,1]        [,2]        [,3]        [,4]        [,5]        [,6]       
#V1 Numeric,100 Numeric,100 Numeric,100 Numeric,100 Numeric,100 Numeric,100
#V2 Numeric,100 Numeric,100 Numeric,100 Numeric,100 Numeric,100 Numeric,100
#V3 Numeric,100 Numeric,100 Numeric,100 Numeric,100 Numeric,100 Numeric,100
#V4 Numeric,100 Numeric,100 Numeric,100 Numeric,100 Numeric,100 Numeric,100
#V5 Numeric,100 Numeric,100 Numeric,100 Numeric,100 Numeric,100 Numeric,100
#V6 Numeric,100 Numeric,100 Numeric,100 Numeric,100 Numeric,100 Numeric,100
#   [,7]        [,8]        [,9]        [,10]      
#V1 Numeric,100 Numeric,100 Numeric,100 Numeric,100
#V2 Numeric,100 Numeric,100 Numeric,100 Numeric,100
#V3 Numeric,100 Numeric,100 Numeric,100 Numeric,100
#V4 Numeric,100 Numeric,100 Numeric,100 Numeric,100
#V5 Numeric,100 Numeric,100 Numeric,100 Numeric,100
#V6 Numeric,100 Numeric,100 Numeric,100 Numeric,100

I tried calculating the correlations like this:

lapply(frames,
       function(x){
                   cor(x[,1]$V1, x[-c(1:1)])
                   return(x)
                   }
                    ) 

But this resulted in an error:

Error in x[, 1]: incorrect number of dimensions

I'm not very familiar with lapply or loops in general, so I could really use some help.

Your reproducible example is not reproducible. One problem is your data is not a data.frame or list

class(dataframes) 
[1] "matrix" "array"

In addition there are a few easy mistakes like returning x in lapply ( x is the input here and not the result) and double subsetting x . Fixing these minor mistakes fixes your problem

dataframes <- replicate(10, 
                        as.data.frame(replicate(20, rnorm(100)))
                        simplify = FALSE) # <=== fix
lapply(dataframes, # <=== name corrected
       function(x){
                   cor(x$V1, x[-1]) # no need to subset x before `$V1`
                   # return(x) # <== Remove return x
         }
       ) 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM