如何计算数据框列表中某些列之间的相关性？

Question

I need to generate 20 different samples (n=100) from standard normal distribution (X0, X1, X2, ..., X19) and calculate the correlation between X0 and all the other samples X2...X19.我需要从标准正态分布（X0、X1、X2、...、X19）中生成 20 个不同的样本（n=100），并计算 X0 与所有其他样本 X2...X19 之间的相关性。 I know how to do this for one "whole sample" (X0...X19) but I should do this for several samples of X0...X19 simultaneously.我知道如何对一个“整个样本”（X0...X19）执行此操作，但我应该同时对多个 X0...X19 样本执行此操作。 I tried generating a list of dataframes (each dataframe containing one sample of X0...X19) and iterate through it but it failed for some reason.我尝试生成一个数据帧列表（每个 dataframe 包含一个 X0...X19 样本）并遍历它，但由于某种原因它失败了。

My data looks like this:我的数据如下所示：

dataframes <- replicate(10, as.data.frame(replicate(20, rnorm(100))))

head(dataframes)

#   [,1]        [,2]        [,3]        [,4]        [,5]        [,6]       
#V1 Numeric,100 Numeric,100 Numeric,100 Numeric,100 Numeric,100 Numeric,100
#V2 Numeric,100 Numeric,100 Numeric,100 Numeric,100 Numeric,100 Numeric,100
#V3 Numeric,100 Numeric,100 Numeric,100 Numeric,100 Numeric,100 Numeric,100
#V4 Numeric,100 Numeric,100 Numeric,100 Numeric,100 Numeric,100 Numeric,100
#V5 Numeric,100 Numeric,100 Numeric,100 Numeric,100 Numeric,100 Numeric,100
#V6 Numeric,100 Numeric,100 Numeric,100 Numeric,100 Numeric,100 Numeric,100
#   [,7]        [,8]        [,9]        [,10]      
#V1 Numeric,100 Numeric,100 Numeric,100 Numeric,100
#V2 Numeric,100 Numeric,100 Numeric,100 Numeric,100
#V3 Numeric,100 Numeric,100 Numeric,100 Numeric,100
#V4 Numeric,100 Numeric,100 Numeric,100 Numeric,100
#V5 Numeric,100 Numeric,100 Numeric,100 Numeric,100
#V6 Numeric,100 Numeric,100 Numeric,100 Numeric,100

I tried calculating the correlations like this:我尝试计算这样的相关性：

lapply(frames,
       function(x){
                   cor(x[,1]$V1, x[-c(1:1)])
                   return(x)
                   }
                    )

But this resulted in an error:但这导致了一个错误：

Error in x[, 1]: incorrect number of dimensions x[, 1] 中的错误：维数不正确

I'm not very familiar with lapply or loops in general, so I could really use some help.一般来说，我对 lapply 或 loops 不是很熟悉，所以我真的可以使用一些帮助。

Answer 1

Your reproducible example is not reproducible.您的可重现示例不可重现。 One problem is your data is not a data.frame or list一个问题是您的数据不是data.frame或list

class(dataframes) 
[1] "matrix" "array"

In addition there are a few easy mistakes like returning x in lapply ( x is the input here and not the result) and double subsetting x .此外还有一些简单的错误，例如在lapply中返回x （ x是这里的输入而不是结果）和双子集x 。 Fixing these minor mistakes fixes your problem修复这些小错误可以解决您的问题

dataframes <- replicate(10, 
                        as.data.frame(replicate(20, rnorm(100)))
                        simplify = FALSE) # <=== fix
lapply(dataframes, # <=== name corrected
       function(x){
                   cor(x$V1, x[-1]) # no need to subset x before `$V1`
                   # return(x) # <== Remove return x
         }
       )

如何计算数据框列表中某些列之间的相关性？

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-04-08 07:45:56

如何计算数据框列表中某些列之间的相关性？

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-04-08 07:45:56

解决方案1
1 已采纳 2021-04-08 07:45:56