Multiple tests with pairwise combinations using dplyr/tidyverse

Question

My question is related to this one but a more complex example, in which I would like to statistically compare multiple columns in all combinations, and each of the columns has a different number of samples.

Consider the original data:

# A tibble: 51 x 3
   trial person score
   <chr> <chr>  <dbl>
 1 foo   a      0.266
 2 bar   b      0.372
 3 foo   c      0.573
 4 bar   a      0.908
 5 foo   b      0.202
 6 bar   c      0.898
 7 foo   a      0.945
 8 bar   b      0.661
 9 foo   c      0.629
10 foo   b      0.206

For each trial type, I'd like to run a statistical test comparing the scores of each person. So, I need the following test results:

Trial foo , compare all score samples of persons A–B, B–C, C–A
Trial bar , compare all score samples of persons A–B, B–C, C–A

Of course, there are more than two trials, and more than three persons.

Hence, the solution using group_split given in the other question does not work, as it implies always testing agains the first person (in my case), not all pairwise combinations.

So, in the following code, I'm stuck at two points:

library(tidyverse)
#> Registered S3 methods overwritten by 'ggplot2':
#>   method         from 
#>   [.quosures     rlang
#>   c.quosures     rlang
#>   print.quosures rlang
library(broom)

set.seed(1)

df = tibble::tibble(
    trial = rep(c("foo", "bar"), 30),
    person = rep(c("a", "b", "c"), 20),
    score = runif(60)
  ) %>% 
  filter(score > 0.2)

df %>% 
  group_by(person, trial) %>% 
  summarize(scores = list(score)) %>% 
  spread(person, scores) %>%
  group_split(trial) %>% 
  map_df(function(data) {
    data %>% 
      summarize_at(vars(b:c), function(x) {
        wilcox.test(.$a, x, paired = FALSE) %>% broom::tidy
      })
  })
#> Error in wilcox.test.default(.$a, x, paired = FALSE): 'x' must be numeric

^{Created on 2019-05-29 by the reprex package (v0.3.0)}

The value of x is apparently not just the actual list of scores, but the column vector of scores for a single trial. But I don't know how else to deal with the fact that the number of samples in each person is different.

Also, I still have to manually specify the column names, which would already be a combinatorial nightmare if there were more than, say, four persons.

I can somehow get the combinations as such:

df %>% 
  group_split(trial) %>% 
  map_df(function(data) {
    combinations = expand(tibble(x = unique(data$person), y = unique(data$person)), x, y) %>% filter(x != y)
  })

… but that doesn't really help in creating columns for comparison.

What could I do to make this work?

Answer 1

This will allow you to programmatically specify combinations and get around the error you were hitting in wilcox.test() .

combos <- unique(df$person) %>%
  combn(2, simplify = F) %>%
  set_names(map_chr(., ~ paste(., collapse = "_")))

df %>% 
  group_split(trial) %>%
  set_names(map_chr(., ~ unique(.$trial))) %>% 
  map_df(function(x) {
    map_df(combos, function(y) {
      filter(x, person %in% y) %>% 
        wilcox.test(score ~ person, data = .) %>% 
        broom::tidy()
    }, .id = "contrast")
  }, .id = "trial")

# A tibble: 6 x 6
  trial contrast statistic p.value method                 alternative
  <chr> <chr>        <dbl>   <dbl> <chr>                  <chr>      
1 bar   a_b             34   0.878 Wilcoxon rank sum test two.sided  
2 bar   a_c             32   1     Wilcoxon rank sum test two.sided  
3 bar   b_c             31   0.959 Wilcoxon rank sum test two.sided  
4 foo   a_b             41   1     Wilcoxon rank sum test two.sided  
5 foo   a_c             41   1     Wilcoxon rank sum test two.sided  
6 foo   b_c             43   0.863 Wilcoxon rank sum test two.sided

Since this differs a lot from the pattern you started with, I'm not sure it will work for your real world case, but it works here so I wanted to share.

Answer 2

Here is an alternative solution that uses nesting to handle groups (persons) with different number of measurements.

library("broom")
library("tidyverse")

set.seed(1)

df <-
  tibble(
    trial = rep(c("foo", "bar"), 30),
    person = rep(c("a", "b", "c"), 20),
    score = runif(60)
  ) %>%
  filter(score > 0.2)

comparisons <- df %>%
  expand(
    trial,
    group1 = person,
    group2 = person
  ) %>%
  filter(
    group1 < group2
  )
comparisons
#> # A tibble: 6 × 3
#>   trial group1 group2
#>   <chr> <chr>  <chr> 
#> 1 bar   a      b     
#> 2 bar   a      c     
#> 3 bar   b      c     
#> 4 foo   a      b     
#> 5 foo   a      c     
#> 6 foo   b      c

df <- df %>% nest_by(trial, person)
df
#> # A tibble: 6 × 3
#> # Rowwise:  trial, person
#>   trial person               data
#>   <chr> <chr>  <list<tibble[,1]>>
#> 1 bar   a                 [8 × 1]
#> 2 bar   b                 [8 × 1]
#> 3 bar   c                 [8 × 1]
#> 4 foo   a                 [9 × 1]
#> 5 foo   b                 [9 × 1]
#> 6 foo   c                 [9 × 1]

comparisons %>%
  inner_join(
    df, by = c("trial", "group1" = "person")
  ) %>%
  inner_join(
    df, by = c("trial", "group2" = "person")
  ) %>%
  mutate(
    p.value = map2_dbl(
      data.x, data.y, ~ wilcox.test(.x$score, .y$score)$p.value
    )
  )
#> # A tibble: 6 × 6
#>   trial group1 group2             data.x             data.y p.value
#>   <chr> <chr>  <chr>  <list<tibble[,1]>> <list<tibble[,1]>>   <dbl>
#> 1 bar   a      b                 [8 × 1]            [8 × 1]   0.878
#> 2 bar   a      c                 [8 × 1]            [8 × 1]   1    
#> 3 bar   b      c                 [8 × 1]            [8 × 1]   0.959
#> 4 foo   a      b                 [9 × 1]            [9 × 1]   1    
#> 5 foo   a      c                 [9 × 1]            [9 × 1]   1    
#> 6 foo   b      c                 [9 × 1]            [9 × 1]   0.863

^{Created on 2022-03-17 by the reprex package (v2.0.1)}

Multiple tests with pairwise combinations using dplyr/tidyverse

Question

2 answers

solution1
1 ACCPTED 2019-05-29 14:05:02

solution2
1 2022-03-16 23:22:09

Multiple tests with pairwise combinations using dplyr/tidyverse

Question

2 answers

solution1 1 ACCPTED 2019-05-29 14:05:02

solution2 1 2022-03-16 23:22:09

solution1
1 ACCPTED 2019-05-29 14:05:02

solution2
1 2022-03-16 23:22:09