使用dplyr为每个组应用

Question

df <- data.frame(group = rep(1:4, each = 10), 
                   x1 = rnorm(40),  x2 = rnorm(40), x3 = rnorm(40), x4 = rnorm(40), 
                   X5 = rnorm(40), x6 = rnorm(40), x7 = rnorm(40))

sapply(df[, 4:ncol(df)], function(x) sd(x)/mean(x))

I want to apply this function for each group. 我想对每个组应用此功能。 How do I correct the below command? 如何更正以下命令？

df %>% dplyr::group_by(group) %>% do.call(sapply(.[, 4:ncol(.)] function(x) sd(x)/mean(x)))

Answer 1

If I understood your question/objective, the following will give the results you're seeking. 如果我了解您的问题/目的，则以下内容将为您提供所需的结果。 It uses the plyr package over the dplyr package. 它在dplyr软件包上使用plyr软件包。 You're likely running into issues using the %>% function with do.call as well, since %>% is just a shortcut for passing the preceding object as the first argument to the subsequent function, and do.call expects a named function as its first argument 使用％>％函数和do.call也可能会遇到问题，因为％>％只是将前一个对象作为后一个函数的第一个参数传递的快捷方式，而do.call需要一个命名函数作为第一个论点

library(plyr)

df <- data.frame(group = rep(1:4, each = 10), 
                 x1 = rnorm(40),  x2 = rnorm(40), x3 = rnorm(40), x4 = rnorm(40), 
                 X5 = rnorm(40), x6 = rnorm(40), x7 = rnorm(40))

ddply(df,.(group),function(x) 
  { 
    sapply(x[,4:ncol(x)],function(y) sd(y)/mean(y))
  })

Gives the following results 得到以下结果

 group        x3        x4        X5         x6        x7
1     1  1.650401 -1.591829  1.509770   6.464991  3.520367
2     2 11.491301 -2.326737 -1.725810 -11.712510  2.293093
3     3 -3.623159 -1.416755  2.958689   1.629667 -4.318230
4     4  9.169641 -4.219095  2.083300   1.985500 -1.678107

Answer 2

Consider base R's by (object-oriented wrapper to tapply ): 考虑以R by基础的（面向对象的包装器以tapply ）：

Data (seeded for reproducibility) 数据 （为可重复性而播种）

set.seed(3219)
df <- data.frame(group = rep(1:4, each = 10), 
                   x1 = rnorm(40),  x2 = rnorm(40), x3 = rnorm(40), x4 = rnorm(40), 
                   X5 = rnorm(40), x6 = rnorm(40), x7 = rnorm(40))

by

by_list <- by(df, df$group, function(sub) 
    sapply(sub[, 4:ncol(sub)], function(x) sd(x)/mean(x))
)

# LIST
by_list 
# df$group: 1
#        x3        x4        X5        x6        x7 
# -1.077354  2.252270 -2.256086 -1.716327 -5.273771 
# ------------------------------------------------------------ 
# df$group: 2
#         x3         x4         X5         x6         x7 
#   2.580065   5.054094 -10.985927  32.716116   6.732901 
# ------------------------------------------------------------ 
# df$group: 3
#         x3         x4         X5         x6         x7 
#  -3.523565  -1.670539  -5.042595  -7.787303 -15.486737 
# ------------------------------------------------------------ 
# df$group: 4
#        x3        x4        X5        x6        x7 
# -5.597470 -9.842997  1.985010 33.657188  2.629724 

# MATRIX
do.call(rbind, by_list)

#          x3        x4         X5        x6         x7
# 1 -1.077354  2.252270  -2.256086 -1.716327  -5.273771
# 2  2.580065  5.054094 -10.985927 32.716116   6.732901
# 3 -3.523565 -1.670539  -5.042595 -7.787303 -15.486737
# 4 -5.597470 -9.842997   1.985010 33.657188   2.629724

使用dplyr为每个组应用

问题描述

2 个解决方案

解决方案1
3 已采纳 2019-03-02 21:57:43

解决方案2
1 2019-03-02 22:34:40

使用dplyr为每个组应用

问题描述

2 个解决方案

解决方案1 3 已采纳 2019-03-02 21:57:43

解决方案2 1 2019-03-02 22:34:40

解决方案1
3 已采纳 2019-03-02 21:57:43

解决方案2
1 2019-03-02 22:34:40