简体   繁体   English

R中的聚合函数未正确分配给df

[英]Aggregate function in R not assigning to df correctly

I have a dataframe in R with columns named ag where cols a and b are non-numeric and the rest are numeric. 我在R中有一个名为ag的列的数据帧,其中cols a和b是非数字的,其余的是数字的。

When I run the following line in the console, it works as intended - giving me the standard deviation, n, and mean of each of the variables: 当我在控制台中运行以下行时,它会按预期运行-为我提供每个变量的标准差,n和均值:

df %>% 
select(a, b, c, d, e) %>%
aggregate(.~a+b, data = ., FUN = function(x) c(avg = mean(x), std = sd(x, na.rm = TRUE), n = length(x)))

However, when I try and assign the output to a dataframe, it only runs the mean function and doesn't create the columns for standard deviation or n. 但是,当我尝试将输出分配给数据框时,它仅运行均值函数,而不创建标准差或n的列。 Why does this happen? 为什么会这样?

As we are using the dplyr the group_by and summarise/mutate can get the expected output 当我们使用dplyrgroup_bysummarise/mutate可以获得预期的输出

library(dplyr)
df %>% 
   select(a, b, c, d, e) %>%
   group_by(a, b) %>%
   mutate(n = n()) %>%
   group_by(n, add = TRUE) %>%
   summarise_all(funs(mean, sd)) 

Regarding why the aggregate is behaving differently, we are concatenating the output of two or more function and it returns a single column with matrix output for 'c', 'd' and 'e'. 关于为什么aggregate的行为不同,我们将两个或多个函数的输出串联起来,它返回的单列具有针对“ c”,“ d”和“ e”的matrix输出。

str(res)
#'data.frame':   5 obs. of  5 variables:
# $ a: Factor w/ 3 levels "A","B","C": 1 3 1 2 3
# $ b: Factor w/ 2 levels "a","b": 1 1 2 2 2
# $ c: num [1:5, 1:3] -0.495 0.131 0.448 -0.495 -0.3 ...
#  ..- attr(*, "dimnames")=List of 2
#  .. ..$ : NULL
#  .. ..$ : chr  "avg" "std" "n"
# $ d: num [1:5, 1:3] -0.713 1.868 -0.71 -0.508 -0.545 ...
#  ..- attr(*, "dimnames")=List of 2
#  .. ..$ : NULL
#  .. ..$ : chr  "avg" "std" "n"
# $ e: num [1:5, 1:3] -0.893 -0.546 -0.421 1.572 -0.867 ...
#  ..- attr(*, "dimnames")=List of 2
#  .. ..$ : NULL
#  .. ..$ : chr  "avg" "std" "n"

where res is the output from the OP's code 其中res是OP的代码的输出

In order to convert it to normal data.frame columns, use 为了将其转换为普通的data.frame列,请使用

res1 <- do.call(data.frame, res)
str(res1)
#'data.frame':   5 obs. of  11 variables:
# $ a    : Factor w/ 3 levels "A","B","C": 1 3 1 2 3
# $ b    : Factor w/ 2 levels "a","b": 1 1 2 2 2
# $ c.avg: num  -0.495 0.131 0.448 -0.495 -0.3
# $ c.std: num  0.233 NA NA 1.589 1.116
# $ c.n  : num  2 1 1 3 2
# $ d.avg: num  -0.713 1.868 -0.71 -0.508 -0.545
# $ d.std: num  1.365 NA NA 0.727 0.322
# $ d.n  : num  2 1 1 3 2
# $ e.avg: num  -0.893 -0.546 -0.421 1.572 -0.867
# $ e.std: num  0.771 NA NA 1.371 0.255
# $ e.n  : num  2 1 1 3 2

data 数据

set.seed(24)
df <- data.frame(a = rep(LETTERS[1:3], each = 3), 
   b = sample(letters[1:2], 9, replace = TRUE), 
   c = rnorm(9), d = rnorm(9), e = rnorm(9))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM