[英]R apply correlation function to a list
I have a data frame like this: 我有一个像这样的数据框:
set.seed(1)
category <- c(rep('A',100), rep('B',100), rep('C',100))
var1 = rnorm(1:300)
var2 = rnorm(1:300)
df<-data.frame(category=category, var1 = var1, var2=var2)
I need to calculate the correlations between var1 and var2 by category. 我需要按类别计算var1和var2之间的相关性。 I think I can first
split
the df
by category
and apply the cor
function to the list. 我想我可以
df
category
split
df
,然后将cor
函数应用于列表。 But I am really confused about hot to use the lapply
function. 但是我对使用
lapply
函数感到很困惑。 Could someone kindly help me out? 有人可以帮我吗?
This should produce the desired result: 这应该产生期望的结果:
lapply(split(df, category), function(dfs) cor(dfs$var1, dfs$var2))
EDIT: 编辑:
You can also use by
(as suggested by @thelatemail): 您也可以使用
by
(如@thelatemail建议):
by(df, df$category, function(x) cor(x$var1,x$var2))
您可以使用sapply
获取相同的内容,但将其作为向量而不是列表
sapply(split(df, category), function(dfs) cor(dfs$var1, dfs$var2))
And just for comparison, here's how you'd do it with the dplyr
package. 只是为了进行比较,这是使用
dplyr
软件包的方法。
library(dplyr)
df %>% group_by(category) %>% summarize(cor=cor(var1,var2))
# category cor
# 1 A -0.05043706
# 2 B 0.13519013
# 3 C -0.04186283
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.