[英]How to calculate correlations between certain columns in a list of dataframes?
I need to generate 20 different samples (n=100) from standard normal distribution (X0, X1, X2, ..., X19) and calculate the correlation between X0 and all the other samples X2...X19.我需要从标准正态分布(X0、X1、X2、...、X19)中生成 20 个不同的样本(n=100),并计算 X0 与所有其他样本 X2...X19 之间的相关性。 I know how to do this for one "whole sample" (X0...X19) but I should do this for several samples of X0...X19 simultaneously.我知道如何对一个“整个样本”(X0...X19)执行此操作,但我应该同时对多个 X0...X19 样本执行此操作。 I tried generating a list of dataframes (each dataframe containing one sample of X0...X19) and iterate through it but it failed for some reason.我尝试生成一个数据帧列表(每个 dataframe 包含一个 X0...X19 样本)并遍历它,但由于某种原因它失败了。
My data looks like this:我的数据如下所示:
dataframes <- replicate(10, as.data.frame(replicate(20, rnorm(100))))
head(dataframes)
# [,1] [,2] [,3] [,4] [,5] [,6]
#V1 Numeric,100 Numeric,100 Numeric,100 Numeric,100 Numeric,100 Numeric,100
#V2 Numeric,100 Numeric,100 Numeric,100 Numeric,100 Numeric,100 Numeric,100
#V3 Numeric,100 Numeric,100 Numeric,100 Numeric,100 Numeric,100 Numeric,100
#V4 Numeric,100 Numeric,100 Numeric,100 Numeric,100 Numeric,100 Numeric,100
#V5 Numeric,100 Numeric,100 Numeric,100 Numeric,100 Numeric,100 Numeric,100
#V6 Numeric,100 Numeric,100 Numeric,100 Numeric,100 Numeric,100 Numeric,100
# [,7] [,8] [,9] [,10]
#V1 Numeric,100 Numeric,100 Numeric,100 Numeric,100
#V2 Numeric,100 Numeric,100 Numeric,100 Numeric,100
#V3 Numeric,100 Numeric,100 Numeric,100 Numeric,100
#V4 Numeric,100 Numeric,100 Numeric,100 Numeric,100
#V5 Numeric,100 Numeric,100 Numeric,100 Numeric,100
#V6 Numeric,100 Numeric,100 Numeric,100 Numeric,100
I tried calculating the correlations like this:我尝试计算这样的相关性:
lapply(frames,
function(x){
cor(x[,1]$V1, x[-c(1:1)])
return(x)
}
)
But this resulted in an error:但这导致了一个错误:
Error in x[, 1]: incorrect number of dimensions x[, 1] 中的错误:维数不正确
I'm not very familiar with lapply or loops in general, so I could really use some help.一般来说,我对 lapply 或 loops 不是很熟悉,所以我真的可以使用一些帮助。
Your reproducible example is not reproducible.您的可重现示例不可重现。 One problem is your data is not a data.frame
or list
一个问题是您的数据不是data.frame
或list
class(dataframes)
[1] "matrix" "array"
In addition there are a few easy mistakes like returning x
in lapply
( x
is the input here and not the result) and double subsetting x
.此外还有一些简单的错误,例如在lapply
中返回x
( x
是这里的输入而不是结果)和双子集x
。 Fixing these minor mistakes fixes your problem修复这些小错误可以解决您的问题
dataframes <- replicate(10,
as.data.frame(replicate(20, rnorm(100)))
simplify = FALSE) # <=== fix
lapply(dataframes, # <=== name corrected
function(x){
cor(x$V1, x[-1]) # no need to subset x before `$V1`
# return(x) # <== Remove return x
}
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.