R：如何应用为多列输出数据帧的函数（使用dplyr）？

Question

I want to find correlations, p-values and 95% CI between one specific column and all other columns in a dataframe. 我想在一个特定列和数据帧中的所有其他列之间找到相关性，p值和95％CI。 The 'broom' package provides an example how to do that between two columns using cor.test with dplyr and pipes. 'broom'包提供了一个示例，说明如何使用带有dplyr和管道的cor.test在两列之间执行此操作。 For mtcars and, say, mpg column we can run a correlation with another column: 对于mtcars，比如mpg列，我们可以与另一列运行相关：

library(dplyr)
library(broom)
mtcars %>% do(tidy(cor.test(.$mpg, .$cyl)))

estimate statistic      p.value parameter   conf.low  conf.high
1 -0.852162 -8.919699 6.112687e-10        30 -0.9257694 -0.7163171

The output is a single-row dataframe. 输出是单行数据帧。 I'd like to run cor.test for mpg with each column and send the output to a separate row. 我想为每列运行cor.test for mpg并将输出发送到一个单独的行。 When mpg column is paired with every other column, the desired output would look like this: 当mpg列与每个其他列配对时，所需的输出将如下所示：

    estimate statistic      p.value parameter   conf.low     conf.high
cyl  -0.852162  -8.919699 6.112687e-10       30 -0.9257694 -0.7163171
disp -0.8475514 -8.747152 9.380327e-10       30 -0.9233594 -0.7081376
hp   -0.7761684 -6.742389 1.787835e-07       30 -0.8852686 -0.5860994
drat  0.6811719  5.096042 1.77624e-05        30 0.4360484  0.832201
wt   -0.8676594 -9.559044 1.293959e-10       30 -0.9338264 -0.7440872
qsec  0.418684   2.525213 0.01708199         30 0.08195487 0.6696186
vs    0.6640389  4.864385 3.415937e-05       30 0.410363 0.8223262
am    0.5998324  4.106127 0.0002850207       30 0.3175583  0.784452
gear  0.4802848  2.999191 0.005400948        30 0.1580618 0.7100628
carb -0.5509251  -3.61575 0.001084446        30 -0.754648 -0.2503183

Note the added row names in the first column. 请注意第一列中添加的行名称。 They show which column was paired with mpg for the cor.test. 它们显示哪个列与cor.test的mpg配对。 Ideally, I'd like to do this with dplyr and pipes. 理想情况下，我想用dplyr和管道来做这件事。

Answer 1

Here's a solution that sticks with the do approach. 下面是与坚持的解决方案do的方法。 The step you're missing is to gather your data and then group by the variable. 您缺少的步骤是收集数据，然后按变量分组。

library(dplyr)
library(tidyr)
library(broom)

mtcars %>%
  gather(var, value, -mpg) %>%
  group_by(var) %>%
  do(tidy(cor.test(.$mpg, .$value))) %>%
  ungroup() %>%
  mutate(var = factor(var, names(mtcars)[-1])) %>%
  arrange(var)

And here's an example that's more along the base R approach (though I used pipes for convenience, but it's easily adaptable) 这里有一个更基本R方法的例子（虽然为方便起见我使用了管道，但它很容易适应）

library(dplyr)
library(broom)

xvar <- "mpg"
yvar <- names(mtcars)[!names(mtcars) %in% xvar]

lapply(yvar,
       function(yvar, xvar, DF)
       {
         cor.test(DF[[xvar]], DF[[yvar]]) %>%
           tidy()
       },
       xvar,
       mtcars) %>%
  bind_rows() %>%
  mutate(yvar = yvar) %>%
  select(yvar, estimate:conf.high)

R：如何应用为多列输出数据帧的函数（使用dplyr）？

问题描述

1 个解决方案

解决方案1
5 已采纳 2016-06-18 08:47:10

R：如何应用为多列输出数据帧的函数（使用dplyr）？

问题描述

1 个解决方案

解决方案1 5 已采纳 2016-06-18 08:47:10

解决方案1
5 已采纳 2016-06-18 08:47:10