简体   繁体   中英

Multiple tests with pairwise combinations using dplyr/tidyverse

My question is related to this one but a more complex example, in which I would like to statistically compare multiple columns in all combinations, and each of the columns has a different number of samples.

Consider the original data:

# A tibble: 51 x 3
   trial person score
   <chr> <chr>  <dbl>
 1 foo   a      0.266
 2 bar   b      0.372
 3 foo   c      0.573
 4 bar   a      0.908
 5 foo   b      0.202
 6 bar   c      0.898
 7 foo   a      0.945
 8 bar   b      0.661
 9 foo   c      0.629
10 foo   b      0.206

For each trial type, I'd like to run a statistical test comparing the scores of each person. So, I need the following test results:

  • Trial foo , compare all score samples of persons A–B, B–C, C–A
  • Trial bar , compare all score samples of persons A–B, B–C, C–A

Of course, there are more than two trials, and more than three persons.

Hence, the solution using group_split given in the other question does not work, as it implies always testing agains the first person (in my case), not all pairwise combinations.

So, in the following code, I'm stuck at two points:

library(tidyverse)
#> Registered S3 methods overwritten by 'ggplot2':
#>   method         from 
#>   [.quosures     rlang
#>   c.quosures     rlang
#>   print.quosures rlang
library(broom)

set.seed(1)

df = tibble::tibble(
    trial = rep(c("foo", "bar"), 30),
    person = rep(c("a", "b", "c"), 20),
    score = runif(60)
  ) %>% 
  filter(score > 0.2)

df %>% 
  group_by(person, trial) %>% 
  summarize(scores = list(score)) %>% 
  spread(person, scores) %>%
  group_split(trial) %>% 
  map_df(function(data) {
    data %>% 
      summarize_at(vars(b:c), function(x) {
        wilcox.test(.$a, x, paired = FALSE) %>% broom::tidy
      })
  })
#> Error in wilcox.test.default(.$a, x, paired = FALSE): 'x' must be numeric

Created on 2019-05-29 by the reprex package (v0.3.0)

The value of x is apparently not just the actual list of scores, but the column vector of scores for a single trial. But I don't know how else to deal with the fact that the number of samples in each person is different.

Also, I still have to manually specify the column names, which would already be a combinatorial nightmare if there were more than, say, four persons.

I can somehow get the combinations as such:

df %>% 
  group_split(trial) %>% 
  map_df(function(data) {
    combinations = expand(tibble(x = unique(data$person), y = unique(data$person)), x, y) %>% filter(x != y)
  })

… but that doesn't really help in creating columns for comparison.

What could I do to make this work?

This will allow you to programmatically specify combinations and get around the error you were hitting in wilcox.test() .

combos <- unique(df$person) %>%
  combn(2, simplify = F) %>%
  set_names(map_chr(., ~ paste(., collapse = "_")))

df %>% 
  group_split(trial) %>%
  set_names(map_chr(., ~ unique(.$trial))) %>% 
  map_df(function(x) {
    map_df(combos, function(y) {
      filter(x, person %in% y) %>% 
        wilcox.test(score ~ person, data = .) %>% 
        broom::tidy()
    }, .id = "contrast")
  }, .id = "trial")

# A tibble: 6 x 6
  trial contrast statistic p.value method                 alternative
  <chr> <chr>        <dbl>   <dbl> <chr>                  <chr>      
1 bar   a_b             34   0.878 Wilcoxon rank sum test two.sided  
2 bar   a_c             32   1     Wilcoxon rank sum test two.sided  
3 bar   b_c             31   0.959 Wilcoxon rank sum test two.sided  
4 foo   a_b             41   1     Wilcoxon rank sum test two.sided  
5 foo   a_c             41   1     Wilcoxon rank sum test two.sided  
6 foo   b_c             43   0.863 Wilcoxon rank sum test two.sided  

Since this differs a lot from the pattern you started with, I'm not sure it will work for your real world case, but it works here so I wanted to share.

Here is an alternative solution that uses nesting to handle groups (persons) with different number of measurements.

library("broom")
library("tidyverse")

set.seed(1)

df <-
  tibble(
    trial = rep(c("foo", "bar"), 30),
    person = rep(c("a", "b", "c"), 20),
    score = runif(60)
  ) %>%
  filter(score > 0.2)

comparisons <- df %>%
  expand(
    trial,
    group1 = person,
    group2 = person
  ) %>%
  filter(
    group1 < group2
  )
comparisons
#> # A tibble: 6 × 3
#>   trial group1 group2
#>   <chr> <chr>  <chr> 
#> 1 bar   a      b     
#> 2 bar   a      c     
#> 3 bar   b      c     
#> 4 foo   a      b     
#> 5 foo   a      c     
#> 6 foo   b      c

df <- df %>% nest_by(trial, person)
df
#> # A tibble: 6 × 3
#> # Rowwise:  trial, person
#>   trial person               data
#>   <chr> <chr>  <list<tibble[,1]>>
#> 1 bar   a                 [8 × 1]
#> 2 bar   b                 [8 × 1]
#> 3 bar   c                 [8 × 1]
#> 4 foo   a                 [9 × 1]
#> 5 foo   b                 [9 × 1]
#> 6 foo   c                 [9 × 1]

comparisons %>%
  inner_join(
    df, by = c("trial", "group1" = "person")
  ) %>%
  inner_join(
    df, by = c("trial", "group2" = "person")
  ) %>%
  mutate(
    p.value = map2_dbl(
      data.x, data.y, ~ wilcox.test(.x$score, .y$score)$p.value
    )
  )
#> # A tibble: 6 × 6
#>   trial group1 group2             data.x             data.y p.value
#>   <chr> <chr>  <chr>  <list<tibble[,1]>> <list<tibble[,1]>>   <dbl>
#> 1 bar   a      b                 [8 × 1]            [8 × 1]   0.878
#> 2 bar   a      c                 [8 × 1]            [8 × 1]   1    
#> 3 bar   b      c                 [8 × 1]            [8 × 1]   0.959
#> 4 foo   a      b                 [9 × 1]            [9 × 1]   1    
#> 5 foo   a      c                 [9 × 1]            [9 × 1]   1    
#> 6 foo   b      c                 [9 × 1]            [9 × 1]   0.863

Created on 2022-03-17 by the reprex package (v2.0.1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM