[英]R: How to apply a function that outputs a dataframe for multiple columns (using dplyr)?
I want to find correlations, p-values and 95% CI between one specific column and all other columns in a dataframe. 我想在一个特定列和数据帧中的所有其他列之间找到相关性,p值和95%CI。 The 'broom' package provides an example how to do that between two columns using cor.test with dplyr and pipes.
'broom'包提供了一个示例,说明如何使用带有dplyr和管道的cor.test在两列之间执行此操作。 For mtcars and, say, mpg column we can run a correlation with another column:
对于mtcars,比如mpg列,我们可以与另一列运行相关:
library(dplyr)
library(broom)
mtcars %>% do(tidy(cor.test(.$mpg, .$cyl)))
estimate statistic p.value parameter conf.low conf.high
1 -0.852162 -8.919699 6.112687e-10 30 -0.9257694 -0.7163171
The output is a single-row dataframe. 输出是单行数据帧。 I'd like to run cor.test for mpg with each column and send the output to a separate row.
我想为每列运行cor.test for mpg并将输出发送到一个单独的行。 When mpg column is paired with every other column, the desired output would look like this:
当mpg列与每个其他列配对时,所需的输出将如下所示:
estimate statistic p.value parameter conf.low conf.high
cyl -0.852162 -8.919699 6.112687e-10 30 -0.9257694 -0.7163171
disp -0.8475514 -8.747152 9.380327e-10 30 -0.9233594 -0.7081376
hp -0.7761684 -6.742389 1.787835e-07 30 -0.8852686 -0.5860994
drat 0.6811719 5.096042 1.77624e-05 30 0.4360484 0.832201
wt -0.8676594 -9.559044 1.293959e-10 30 -0.9338264 -0.7440872
qsec 0.418684 2.525213 0.01708199 30 0.08195487 0.6696186
vs 0.6640389 4.864385 3.415937e-05 30 0.410363 0.8223262
am 0.5998324 4.106127 0.0002850207 30 0.3175583 0.784452
gear 0.4802848 2.999191 0.005400948 30 0.1580618 0.7100628
carb -0.5509251 -3.61575 0.001084446 30 -0.754648 -0.2503183
Note the added row names in the first column. 请注意第一列中添加的行名称。 They show which column was paired with mpg for the cor.test.
它们显示哪个列与cor.test的mpg配对。 Ideally, I'd like to do this with dplyr and pipes.
理想情况下,我想用dplyr和管道来做这件事。
Here's a solution that sticks with the do
approach. 下面是与坚持的解决方案
do
的方法。 The step you're missing is to gather your data and then group by the variable. 您缺少的步骤是收集数据,然后按变量分组。
library(dplyr)
library(tidyr)
library(broom)
mtcars %>%
gather(var, value, -mpg) %>%
group_by(var) %>%
do(tidy(cor.test(.$mpg, .$value))) %>%
ungroup() %>%
mutate(var = factor(var, names(mtcars)[-1])) %>%
arrange(var)
And here's an example that's more along the base R approach (though I used pipes for convenience, but it's easily adaptable) 这里有一个更基本R方法的例子(虽然为方便起见我使用了管道,但它很容易适应)
library(dplyr)
library(broom)
xvar <- "mpg"
yvar <- names(mtcars)[!names(mtcars) %in% xvar]
lapply(yvar,
function(yvar, xvar, DF)
{
cor.test(DF[[xvar]], DF[[yvar]]) %>%
tidy()
},
xvar,
mtcars) %>%
bind_rows() %>%
mutate(yvar = yvar) %>%
select(yvar, estimate:conf.high)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.