[英]spearman correlation by group in R
你如何在 R 中按組計算 Spearman 相關性。我發現以下鏈接按組討論 Pearson 相關性。 但是當我嘗試用 spearman 替換類型時,它不起作用。
https://stats.stackexchange.com/questions/4040/r-compute-correlation-by-group
對於基本的 R 解決方案,這個怎么樣:
df <- data.frame(group = rep(c("G1", "G2"), each = 10),
var1 = rnorm(20),
var2 = rnorm(20))
r <- by(df, df$group, FUN = function(X) cor(X$var1, X$var2, method = "spearman"))
# df$group: G1
# [1] 0.4060606
# ------------------------------------------------------------
# df$group: G2
# [1] 0.1272727
然后,如果您想要 data.frame 形式的結果:
data.frame(group = dimnames(r)[[1]], corr = as.vector(r))
# group corr
# 1 G1 0.4060606
# 2 G2 0.1272727
編輯:如果您更喜歡基於plyr
的解決方案,這里是一個:
library(plyr)
ddply(df, .(group), summarise, "corr" = cor(var1, var2, method = "spearman"))
很老的問題,但這個tidy
和broom
解決方案非常簡單。 因此,我必須分享方法:
set.seed(123)
df <- data.frame(group = rep(c("G1", "G2"), each = 10),
var1 = rnorm(20),
var2 = rnorm(20))
library(tidyverse)
library(broom)
df %>%
group_by(group) %>%
summarize(correlation = cor(var1, var2,, method = "sp"))
# A tibble: 2 x 2
group correlation
<fct> <dbl>
1 G1 -0.200
2 G2 0.0545
# with pvalues and further stats
df %>%
nest(-group) %>%
mutate(cor=map(data,~cor.test(.x$var1, .x$var2, method = "sp"))) %>%
mutate(tidied = map(cor, tidy)) %>%
unnest(tidied, .drop = T)
# A tibble: 2 x 6
group estimate statistic p.value method alternative
<fct> <dbl> <dbl> <dbl> <chr> <chr>
1 G1 -0.200 198 0.584 Spearman's rank correlation rho two.sided
2 G2 0.0545 156 0.892 Spearman's rank correlation rho two.sided
由於某些時間/ dplyr
版本,您需要編寫此代碼以獲得上述結果並且沒有錯誤:
df %>%
nest(data = -group) %>%
mutate(cor=map(data,~cor.test(.x$var1, .x$var2, method = "sp"))) %>%
mutate(tidied = map(cor, tidy)) %>%
unnest(tidied) %>%
select(-data, -cor)
這是另一種方法:
# split the data by group then apply spearman correlation
# to each element of that list
j <- lapply(split(df, df$group), function(x){cor(x[,2], x[,3], method = "spearman")})
# Bring it together
data.frame(group = names(j), corr = unlist(j), row.names = NULL)
比較我的方法、Josh 的方法和使用 rbenchmark 的 plyr 解決方案:
Dason <- function(){
# split the data by group then apply spearman correlation
# to each element of that list
j <- lapply(split(df, df$group), function(x){cor(x[,2], x[,3], method = "spearman")})
# Bring it together
data.frame(group = names(j), corr = unlist(j), row.names = NULL)
}
Josh <- function(){
r <- by(df, df$group, FUN = function(X) cor(X$var1, X$var2, method = "spearman"))
data.frame(group = attributes(r)$dimnames[[1]], corr = as.vector(r))
}
plyr <- function(){
ddply(df, .(group), summarise, "corr" = cor(var1, var2, method = "spearman"))
}
library(rbenchmark)
benchmark(Dason(), Josh(), plyr())
這給出了輸出
> benchmark(Dason(), Josh(), plyr())
test replications elapsed relative user.self sys.self user.child sys.child
1 Dason() 100 0.19 1.000000 0.19 0 NA NA
2 Josh() 100 0.24 1.263158 0.22 0 NA NA
3 plyr() 100 0.51 2.684211 0.52 0 NA NA
所以看起來我的方法稍微快一點但不是很多。 我認為 Josh 的方法更直觀一些。 plyr 解決方案是最容易編碼的,但它不是最快的(但它確實更方便)!
如果您想為大量組提供有效的解決方案,那么data.table
是您要走的路。
library(data.table)
DT <- as.data.table(df)
setkey(DT, group)
DT[,list(corr = cor(var1,var2,method = 'spearman')), by = group]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.