繁体   English   中英

如何组合 dplyr group_by、summarise、across 和多个 function 输出?

[英]How to combine dplyr group_by, summarise, across and multiple function outputs?

我有以下问题:

tTest = tibble(Cells = rep(c("C1", "C2", "C3"), times = 3), 
               Gene = rep(c("G1", "G2", "G3"), each = 3), 
               Experiment_score = 1:9, 
               Pattern1 = 1:9, 
               Pattern2 = -(1:9), 
               Pattern3 = 9:1) %>%
        group_by(Gene)

我想将Experiment_score与所有Gene的每个Pattern列相关联。

查看页面和示例中的 tidyverse ,我认为这会起作用:

# `corList` is a simple wrapper for `cor` to have exactly two outputs:
corList = function(x, y) {
    result = cor.test(x, y)
    return(list(stat = result$estimate, pval = result$p.value))
}

tTest %>% summarise(across(starts_with("Pattern"), ~ corList(Experiment_score, .x), .names = "{.col}_corr_{.fn}"))

但我得到了这个: 在此处输入图像描述

我通过熔化Pattern列找到了解决方案,为了完整起见,我将在下面发布它,但挑战在于我有几十个Pattern列和数百万行。 如果我融化Pattern列,我最终会得到 50 亿行,这会严重影响我处理数据的能力。

编辑:我自己不完美的解决方案:

# `corVect` is a simple wrapper for `cor` to have exactly two outputs:
corVect = function(x, y) {
    result = cor.test(x, y)
    return(c(stat = result$estimate, pval = result$p.value))
}

tTest %>% pivot_longer(starts_with("Pattern"), names_to = "Pattern", values_to = "Strength") %>%
      group_by(Gene, Pattern) %>%
      summarise(CorrVal = corVect(Experiment_score, Strength)) %>% 
      mutate(CorrType = c("corr", "corr_pval")) %>%
      # Reformat
      pivot_wider(id_cols = c(Gene, Pattern), names_from = CorrType, values_from = CorrVal)

要一步获得所需结果,请将 function 返回包装为tibble而不是list ,然后在across中调用.unpack = TRUE 这里使用一个方便命名的corTibble function:

library(tidyverse)

tTest = tibble(
  Cells = rep(c("C1", "C2", "C3"), times = 3),
  Gene = rep(c("G1", "G2", "G3"), each = 3),
  Experiment_score = 1:9,
  Pattern1 = 1:9 + rnorm(9),  # added some noise
  Pattern2 = -(1:9 + rnorm(9)),
  Pattern3 = 9:1 + rnorm(9)
) %>%
  group_by(Gene)

corTibble = function(x, y) {
  result = cor.test(x, y)
  return(tibble(stat = result$estimate, pval = result$p.value))
}

tTest %>% summarise(across(
  starts_with("Pattern"),
  ~ corTibble(Experiment_score, .x),
  .names = "{.col}_corr",
  .unpack = TRUE
))

#> # A tibble: 3 × 7
#>   Gene  Pattern1_corr_stat Pattern1_corr_pval Pattern2…¹ Patte…² Patte…³ Patte…⁴
#>   <chr>              <dbl>              <dbl>      <dbl>   <dbl>   <dbl>   <dbl>
#> 1 G1                 0.947             0.208      -0.991  0.0866  -1.00   0.0187
#> 2 G2                 0.964             0.172      -0.872  0.325   -0.981  0.126 
#> 3 G3                 0.995             0.0668     -0.680  0.524   -0.409  0.732 
#> # … with abbreviated variable names ¹​Pattern2_corr_stat, ²​Pattern2_corr_pval,
#> #   ³​Pattern3_corr_stat, ⁴​Pattern3_corr_pval

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM