简体   繁体   中英

Aggregating two columns in R dataframe

I have a dataframe in R called food:

foodID   calories   fat    protein

 123       0.5      0.4     0.9
 432       0.65     0.3     0.7
 123       0.32     0.6     0.5
 983       0.82     0.2     0.6

and I'm trying to average up the calories and protein column by foodID .

I tried:

cal_pro <- aggregate(food[2,4], list(food$foodID), function(df) mean(df))

But it appears that i can't select the columns to be applied the mean function by food[2,4]? Could anyone help me out on this.

Using dplyr , you can just group_by and summarize :

food %>%
    group_by(foodID) %>%
    summarize(calories_average = mean(calories),
              protein_average = mean(protein))

# A tibble: 3 x 3
  foodID calories_average protein_average
   <int>            <dbl>           <dbl>
1    123             0.41             0.7
2    432             0.65             0.7
3    983             0.82             0.6

Rather than specifying each variable, you can use summarize_at to select multiple variables to summarize at once. We pass in 2 arguments: the variables to summarize, and a list of functions to apply to them. If the list is named, as it is here, then the name is added to the summary column as a suffix (giving "calores_average" and "protein_average":

food %>%
    group_by(foodID) %>%
    summarize_at(c('calories', 'protein'), list(average = mean))

summarize_at also allows you to use various helper functions to select variables by prefix, suffix, or regex (as shown below). You can learn more about them here: ?tidyselect::select_helpers

food %>%
    group_by(foodID) %>%
    summarize_at(vars(matches('calories|protein')), list(average = mean))

We can use the formula method

aggregate(cbind(calories, protein) ~ foodID, food, mean)

Or using the OP's code, it should be c(2, 4), because if we do 2, 4 , it is selecting the 2nd row of 4th column by row/column indexing

aggregate(food[c(2, 4)], list(food$foodID), mean)

EDIT: Based on @RuiBarradas comments

You can use data.table package-

> setDT(dt)[,list(avg_calorie=mean(calories),avg_protein=mean(protein)),by=foodID]

Output-

    foodID avg_calorie avg_protein
1:    123        0.41         0.7
2:    432        0.65         0.7
3:    983        0.82         0.6

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM