[英]R - Efficient way to apply function on all features grouped using plyr
Let df
be our test data frame: 令
df
为我们的测试数据帧:
set.seed(1)
df <- data.frame(id = c(1,1,2,2,3,3,3),
var1 = round(runif(7)),
var2 = round(runif(7)),
var3 = round(runif(7)))
df
id var1 var2 var3
1 1 0 1 1
2 1 0 1 0
3 2 1 0 1
4 2 1 0 1
5 3 0 0 0
6 3 1 1 1
7 3 1 0 1
I want to summarize over the id and sum up all values like this: 我想总结一下id并总结所有像这样的值:
df %>%
group_by(id) %>%
summarise(sum_var_1 = sum(var1),
sum_var_2 = sum(var2),
sum_var_3 = sum(var3)) %>%
data.frame
id sum_var_1 sum_var_2 sum_var_3
1 1 0 2 1
2 2 2 0 2
3 3 2 1 2
Now the question: Is there a way to avoid the sum_var_2 = sum(var2) [...]
step and do it functionally inside the summarise
with something like a formula or so? 现在的问题是:是否有一种方法可以避免
sum_var_2 = sum(var2) [...]
步骤,并在summarise
使用诸如公式之类的功能来实现? As there a hundreds of features I'd like to sum up! 我想总结一下数百个功能!
Any help would be very appreciated! 任何帮助将不胜感激!
since all your variables start with " var " you can do 由于所有变量都以“ var ”开头,因此您可以
df %>%
group_by(id) %>%
summarise_at(vars(starts_with("var")), sum)
which returns your example. 返回您的示例。
Edit: As @jake-kaupp commented, summarise_all
does the job even better and also does not require the variable names to be normalized: 编辑:@杰克- kaupp评论,
summarise_all
不工作,甚至更好,也不需要变量名是标准化的:
df %>%
group_by(id) %>%
summarise_all(sum)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.