简体   繁体   English

聚合R数据框中的两列

[英]Aggregating two columns in R dataframe

I have a dataframe in R called food: 我在R中有一个名为food的数据框:

foodID   calories   fat    protein

 123       0.5      0.4     0.9
 432       0.65     0.3     0.7
 123       0.32     0.6     0.5
 983       0.82     0.2     0.6

and I'm trying to average up the calories and protein column by foodID . 我试图通过foodID平均起来的热量和蛋白质列

I tried: 我试过了:

cal_pro <- aggregate(food[2,4], list(food$foodID), function(df) mean(df))

But it appears that i can't select the columns to be applied the mean function by food[2,4]? 但是看来我不能选择food [2,4]来应用均值函数的列? Could anyone help me out on this. 谁能帮我这个忙。

Using dplyr , you can just group_by and summarize : 使用dplyr ,您可以只对group_by进行summarize

food %>%
    group_by(foodID) %>%
    summarize(calories_average = mean(calories),
              protein_average = mean(protein))

# A tibble: 3 x 3
  foodID calories_average protein_average
   <int>            <dbl>           <dbl>
1    123             0.41             0.7
2    432             0.65             0.7
3    983             0.82             0.6

Rather than specifying each variable, you can use summarize_at to select multiple variables to summarize at once. 无需指定每个变量,而是可以使用summarize_at选择多个变量以一次汇总。 We pass in 2 arguments: the variables to summarize, and a list of functions to apply to them. 我们传入两个参数:要汇总的变量和要应用到它们的函数列表。 If the list is named, as it is here, then the name is added to the summary column as a suffix (giving "calores_average" and "protein_average": 如果列表是按名称命名的,那么该名称将作为后缀添加到摘要列(给出“ calores_average”和“ protein_average”:

food %>%
    group_by(foodID) %>%
    summarize_at(c('calories', 'protein'), list(average = mean))

summarize_at also allows you to use various helper functions to select variables by prefix, suffix, or regex (as shown below). summarize_at还允许您使用各种辅助函数来按前缀,后缀或正则表达式选择变量(如下所示)。 You can learn more about them here: ?tidyselect::select_helpers 您可以在此处了解有关它们的更多信息: ?tidyselect::select_helpers

food %>%
    group_by(foodID) %>%
    summarize_at(vars(matches('calories|protein')), list(average = mean))

We can use the formula method 我们可以使用公式法

aggregate(cbind(calories, protein) ~ foodID, food, mean)

Or using the OP's code, it should be c(2, 4), because if we do 2, 4 , it is selecting the 2nd row of 4th column by row/column indexing 或使用OP的代码,它应该为c(2,4),因为如果我们执行2, 4 ,它是按行/列索引选择第4列的第2行

aggregate(food[c(2, 4)], list(food$foodID), mean)

EDIT: Based on @RuiBarradas comments 编辑:基于@RuiBarradas评论

You can use data.table package- 您可以使用data.table

> setDT(dt)[,list(avg_calorie=mean(calories),avg_protein=mean(protein)),by=foodID]

Output- 输出-

    foodID avg_calorie avg_protein
1:    123        0.41         0.7
2:    432        0.65         0.7
3:    983        0.82         0.6

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM