简体   繁体   English

R:如何应用为多列输出数据帧的函数(使用dplyr)?

[英]R: How to apply a function that outputs a dataframe for multiple columns (using dplyr)?

I want to find correlations, p-values and 95% CI between one specific column and all other columns in a dataframe. 我想在一个特定列和数据帧中的所有其他列之间找到相关性,p值和95%CI。 The 'broom' package provides an example how to do that between two columns using cor.test with dplyr and pipes. 'broom'包提供了一个示例,说明如何使用带有dplyr和管道的cor.test在两列之间执行此操作。 For mtcars and, say, mpg column we can run a correlation with another column: 对于mtcars,比如mpg列,我们可以与另一列运行相关:

library(dplyr)
library(broom)
mtcars %>% do(tidy(cor.test(.$mpg, .$cyl)))

estimate statistic      p.value parameter   conf.low  conf.high
1 -0.852162 -8.919699 6.112687e-10        30 -0.9257694 -0.7163171

The output is a single-row dataframe. 输出是单行数据帧。 I'd like to run cor.test for mpg with each column and send the output to a separate row. 我想为每列运行cor.test for mpg并将输出发送到一个单独的行。 When mpg column is paired with every other column, the desired output would look like this: 当mpg列与每个其他列配对时,所需的输出将如下所示:

    estimate statistic      p.value parameter   conf.low     conf.high
cyl  -0.852162  -8.919699 6.112687e-10       30 -0.9257694 -0.7163171
disp -0.8475514 -8.747152 9.380327e-10       30 -0.9233594 -0.7081376
hp   -0.7761684 -6.742389 1.787835e-07       30 -0.8852686 -0.5860994
drat  0.6811719  5.096042 1.77624e-05        30 0.4360484  0.832201
wt   -0.8676594 -9.559044 1.293959e-10       30 -0.9338264 -0.7440872
qsec  0.418684   2.525213 0.01708199         30 0.08195487 0.6696186
vs    0.6640389  4.864385 3.415937e-05       30 0.410363 0.8223262
am    0.5998324  4.106127 0.0002850207       30 0.3175583  0.784452
gear  0.4802848  2.999191 0.005400948        30 0.1580618 0.7100628
carb -0.5509251  -3.61575 0.001084446        30 -0.754648 -0.2503183

Note the added row names in the first column. 请注意第一列中添加的行名称。 They show which column was paired with mpg for the cor.test. 它们显示哪个列与cor.test的mpg配对。 Ideally, I'd like to do this with dplyr and pipes. 理想情况下,我想用dplyr和管道来做这件事。

Here's a solution that sticks with the do approach. 下面是与坚持的解决方案do的方法。 The step you're missing is to gather your data and then group by the variable. 您缺少的步骤是收集数据,然后按变量分组。

library(dplyr)
library(tidyr)
library(broom)

mtcars %>%
  gather(var, value, -mpg) %>%
  group_by(var) %>%
  do(tidy(cor.test(.$mpg, .$value))) %>%
  ungroup() %>%
  mutate(var = factor(var, names(mtcars)[-1])) %>%
  arrange(var)

And here's an example that's more along the base R approach (though I used pipes for convenience, but it's easily adaptable) 这里有一个更基本R方法的例子(虽然为方便起见我使用了管道,但它很容易适应)

library(dplyr)
library(broom)

xvar <- "mpg"
yvar <- names(mtcars)[!names(mtcars) %in% xvar]

lapply(yvar,
       function(yvar, xvar, DF)
       {
         cor.test(DF[[xvar]], DF[[yvar]]) %>%
           tidy()
       },
       xvar,
       mtcars) %>%
  bind_rows() %>%
  mutate(yvar = yvar) %>%
  select(yvar, estimate:conf.high)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用dplyr将函数应用于多个输入的数据框并使用输出创建列? - How to apply a function to a data frame for multiple inputs and create columns with the outputs using dplyr? 使用purrr映射将功能应用于dplyr管道中DataFrame中的列选择 - Using purrr map to apply function to selection of columns in DataFrame in dplyr pipeline 使用 apply 将每个 function 调用的多个输出放入 R 中的 dataframe 中的一行 - Placing multiple outputs from each function call using apply into a row in a dataframe in R 使用dplyr应用R数据帧中几列的功能 - Using dplyr to apply a function of several columns of an R data frame 使用 dplyr::mutate(across()) 将多列应用于自定义函数 - Apply Multiple Columns to Custom function Using dplyr::mutate(across()) 如何使用 R 和 dplyr 中的值转换 dataframe 跨多个列更改值 - How to change values across multiple columns using a value conversion dataframe in R with dplyr 如何使用dplyr从R中数据帧的多列中减去一列 - How to subtract one column from multiple columns in a dataframe in R using dplyr 使用 apply 在 R 的多个列上运行 function - Using apply to run a function on multiple columns in R 使用在 dplyr 中返回多个输出的函数向 data.frame 添加多列 - Add multiple columns to data.frame using a function that returns multiple outputs in dplyr 如何在 R 数据帧中矢量化具有多个可能输出的函数 - How to vectorize a function with multiple possible outputs in an R dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM