简体   繁体   English

R-在使用plyr分组的所有功能上应用功能的有效方法

[英]R - Efficient way to apply function on all features grouped using plyr

Let df be our test data frame: df为我们的测试数据帧:

set.seed(1)
df <- data.frame(id = c(1,1,2,2,3,3,3),
                 var1 = round(runif(7)),
                 var2 = round(runif(7)),
                 var3 = round(runif(7)))
df

  id var1 var2 var3
1  1    0    1    1
2  1    0    1    0
3  2    1    0    1
4  2    1    0    1
5  3    0    0    0
6  3    1    1    1
7  3    1    0    1

I want to summarize over the id and sum up all values like this: 我想总结一下id并总结所有像这样的值:

df %>% 
  group_by(id) %>% 
  summarise(sum_var_1 = sum(var1),
            sum_var_2 = sum(var2),
            sum_var_3 = sum(var3)) %>% 
  data.frame

  id sum_var_1 sum_var_2 sum_var_3
1  1         0         2         1
2  2         2         0         2
3  3         2         1         2

Now the question: Is there a way to avoid the sum_var_2 = sum(var2) [...] step and do it functionally inside the summarise with something like a formula or so? 现在的问题是:是否有一种方法可以避免sum_var_2 = sum(var2) [...]步骤,并在summarise使用诸如公式之类的功能来实现? As there a hundreds of features I'd like to sum up! 我想总结一下数百个功能!

Any help would be very appreciated! 任何帮助将不胜感激!

since all your variables start with " var " you can do 由于所有变量都以“ var ”开头,因此您可以

df %>% 
   group_by(id) %>% 
   summarise_at(vars(starts_with("var")), sum)

which returns your example. 返回您的示例。

Edit: As @jake-kaupp commented, summarise_all does the job even better and also does not require the variable names to be normalized: 编辑:@杰克- kaupp评论, summarise_all不工作,甚至更好,也不需要变量名是标准化的:

df %>% 
   group_by(id) %>% 
   summarise_all(sum)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM