我如何使用 dplyr 计算 R 中组之间的相关性？

Question

Let's say i have data frame in R that looks like this :假设我在 R 中有如下所示的数据框：

var = c(rep("A",3),rep("B",3),rep("C",3),rep("D",3),rep("E",3))
y = rnorm(15)
data = tibble(var,y);data

With output:带输出：

# A tibble: 15 x 2
   var        y
   <chr>  <dbl>
 1 A     -1.23 
 2 A     -0.983
 3 A      1.28 
 4 B     -0.268
 5 B     -0.460
 6 B     -1.23 
 7 C      1.87 
 8 C      0.416
 9 C     -1.99 
10 D      0.289
11 D      1.70 
12 D     -0.455
13 E     -0.648
14 E      0.376
15 E     -0.887

i want to calculate the correlation of each distinct pair in R using dplyr.我想使用 dplyr 计算 R 中每个不同对的相关性。 Ideally i want to look like this (the third column to contain the values of each correlation pair):理想情况下，我希望看起来像这样（第三列包含每个相关对的值）：

var1变量1	var2变量2	value价值
A一个	B乙	cor(A,B)心电图(A,B)
A一个	C C	cor(A,C)心电图(A,C)
A一个	D D	cor(A,D)心（A，D）
A一个	E乙	cor(A,E)心（A，E）
B乙	C C	cor(B,E)心（乙，乙）
B乙	D D	cor(B,E)心（乙，乙）
B乙	E乙	cor(B,E)心（乙，乙）
C C	D D	cor(C,E)科尔（C，E）
C C	E乙	cor(C,E)科尔（C，E）
D D	E乙	cor(D,E)心电图(D,E)

How i can do that in R ?我怎么能在 R 中做到这一点？ Any help ?有什么帮助吗？

Additional额外的

if i have another grouping variable say group2:如果我有另一个分组变量说 group2：

var2 = c(rep("A",3),rep("B",3),rep("C",3),rep("D",3),rep("E",3),rep("F",3),
        rep("H",3),rep("I",3))

y2 = rnorm(24)
group2 = c(rep(1,6),rep(2,6),rep(3,6),rep(1,6))
data2 = tibble(var2,group2,y2);data2

which ideally must look like this :理想情况下必须是这样的：

group团体	var1变量1	var2变量2	value价值
1 1	A一个	B乙	cor(A,B)心电图(A,B)
1 1	A一个	H H	cor(A,H)心电图(A,H)
1 1	A一个	I我	cor(A,I)心电图(A,I)
1 1	B乙	H H	cor(B,H)心（B，H）
1 1	B乙	I我	cor(B,I)心（乙，我）
1 1	H H	I我	cor(H,I)心电图(H,I)
2 2	C C	D D	cor(C,D)心（C，D）
3 3	E乙	F F	cor(E,F)心（E，F）

How i can calculate each variable in column var2 on each group group2?我如何计算每个组 group2 的列 var2 中的每个变量？

Answer 1

Another possible solution:另一种可能的解决方案：

library(tidyverse)

df %>% 
  group_by(var) %>% 
  group_map(~ data.frame(.x) %>% set_names(.y)) %>% 
  bind_cols %>% cor %>% 
  {data.frame(row=rownames(.)[row(.)[upper.tri(.)]], 
              col=colnames(.)[col(.)[upper.tri(.)]], 
              corr=.[upper.tri(.)])}

#>    row col       corr
#> 1    A   B -0.9949738
#> 2    A   C -0.9574502
#> 3    B   C  0.9815368
#> 4    A   D -0.7039708
#> 5    B   D  0.6293137
#> 6    C   D  0.4690460
#> 7    A   E -0.5755463
#> 8    B   E  0.4907660
#> 9    C   E  0.3150499
#> 10   D   E  0.9859711

Answer 2

Here is a one-liner via base R这是通过基础 R 的单线

data.frame(t(combn(unique(data$var), 2, function(i)
                     list(v1 = i[[1]], 
                          v2 = i[[2]], 
                          value = cor(data$y[data$var %in% i[[1]]], 
                                      data$y[data$var %in% i[[2]]])))))

   X1 X2         X3
1   A  B   0.997249
2   A  C  0.7544987
3   A  D -0.7924587
4   A  E 0.03567887
5   B  C  0.8010711
6   B  D -0.7450683
7   B  E  0.1096579
8   C  D -0.1976141
9   C  E  0.6828033
10  D  E  0.5812632

Answer 3

1) Add an index column 1, 2, 3, 1, 2, 3, ... and then use read.zoo to convert from long to wide. 1)添加一个索引列 1, 2, 3, 1, 2, 3, ... 然后使用 read.zoo 将 long 转换为 wide。 Take the correlation reshape back to long form using as.data.frame.table and filter out the desired rows.使用 as.data.frame.table 将相关重塑回长格式并过滤掉所需的行。

library(dplyr)
library(zoo)

DF %>%
  mutate(index = sequence(rle(var)$lengths)) %>%
  read.zoo(index = "index", split = "var") %>%
  cor %>%
  as.data.frame.table(responseName = "cor") %>%
  filter(format(Var1) < format(Var2))

2) At the expense of one more line of code we can substitute pivot_wider for read.zoo. 2)以多一行代码为代价，我们可以用 pivot_wider 代替 read.zoo。

library(dplyr)
library(tidyr)

DF %>%
  mutate(index = sequence(rle(var)$lengths)) %>%
  pivot_wider(index, names_from = "var", values_from = "y") %>%
  select(-index) %>%
  cor %>%
  as.data.frame.table(responseName = "cor") %>%
  filter(format(Var1) < format(Var2))

3) A base solution consists of using combn to get the pairs of var with the indicated function f. 3)基本解决方案包括使用 combn 获得具有指定函数 f 的 var 对。

co <- combn(unique(DF$var), 2)
f <- function(v) with(DF, data.frame(t(v), cor = cor(y[var==v[1]], y[var==v[2]])))
do.call("rbind", apply(co, 2, f))

Note笔记

The input in reproducible form.可重现形式的输入。

DF <-
structure(list(var = c("A", "A", "A", "B", "B", "B", "C", "C", 
"C", "D", "D", "D", "E", "E", "E"), y = c(-1.23, -0.983, 1.28, 
-0.268, -0.46, -1.23, 1.87, 0.416, -1.99, 0.289, 1.7, -0.455, 
-0.648, 0.376, -0.887)), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
"14", "15"))

我如何使用 dplyr 计算 R 中组之间的相关性？

问题描述

3 个解决方案

解决方案1
2 2022-07-15 13:47:19

解决方案2
2 2022-07-15 14:07:02

解决方案3
1 已采纳 2022-07-15 14:28:12

Note笔记

我如何使用 dplyr 计算 R 中组之间的相关性？

问题描述

3 个解决方案

解决方案1 2 2022-07-15 13:47:19

解决方案2 2 2022-07-15 14:07:02

解决方案3 1 已采纳 2022-07-15 14:28:12

Note笔记

解决方案1
2 2022-07-15 13:47:19

解决方案2
2 2022-07-15 14:07:02

解决方案3
1 已采纳 2022-07-15 14:28:12