简体   繁体   中英

aggregate multiple columns in a data frame at once calculating different statistics on different columns - R

I am trying to perform what should be a simple exercise of aggregating columns in a dataframe by the first column. Each column is aggregated with a summary statistic, but they are not the same. For example several columns I need the mean, several I need standard deviation, several I need the max value, etc. I am using documentation with the aggregate function within R and for some reason can't get it.

Can you help?

Attached is a simple example data set. For illustration purposes I need to aggregate by the "name" column. Var1 I need the mean, Var2 I need the standard deviation, Var3 I need the max value, Var4 I would like the sum

name    v1  v2  v3  v4  v5
a     78    74  43  76  61
b     50    84  76  73  35
c     4    55   4   86  32
d     23    29  81  73  55
e     36    68  27  70  96

# A tibble: 11 x 5
   name     v1    v2    v3    v4
   <fct> <dbl> <dbl> <int> <int>
 1 a        78   NaN    43    76
 2 b        50   NaN    76    73
 3 c         4   NaN     4    86
 4 d        23   NaN    81    73
 5 e        36   NaN    27    70
 6 f        13   NaN    34    77
 7 g         2   NaN    57    79
 8 h        72   NaN    29     3
 9 i        17   NaN    56    14
10 j        87   NaN    36    13
11 k         9   NaN    74    41

We could use dplyr for flexibility

library(dplyr)
df1 %>%
     group_by(name) %>% 
     summarise(v1 = mean(v1, na.rm = TRUE),
    v2 = sd(v2, na.rm = TRUE), v3 = max(v3, na.rm = TRUE),
            v4 = sum(v4, na.rm = TRUE))

If there are multiple columns to be blocked for different functions, use across

df1 %>%
    group_by(name) %>%
    summarise(across(c(v1, v2), mean, na.rm = TRUE),
               v3 = sd(v3, na.rm = TRUE),
               across(c(v4, v5), sum, na.rm = TRUE))

Or use collap from collapse

library(collapse)
collap(df1, ~ name, custom = list(fmean = c("v1", "v2"),
      fsd = "v3", fsum = c("v4", "v5")))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM