R use group_by and summarise for 7 variables, but only get one result?

Question

I have a large dataset and I group the dataset by year and select 7 variables, then I use summarise, trying to get statistics of each variable by group. but I only get statistics of each group, not for each variable. How could I interpret the results? How could I get results of each variable?

v<-colnames(Cashflow)[c(2,4:ncol(Cashflow))]
Cstats<-Cashflow%>%
  group_by(Y)%>%
  summarise(mean = mean(get(v),na.rm = TRUE),
            observation = n(),
            sd = sd(get(v),na.rm = TRUE),
            min = min(get(v),na.rm = TRUE),
            q25 = quantile(get(v),probs = c(0.25),na.rm = TRUE),
            median = median(get(v),na.rm = TRUE),
            q75 = quantile(get(v),probs = c(0.75),na.rm = TRUE),
            max = max(get(v),na.rm = TRUE))```

And my results is like this:

year mean sd min
1997 1    2   3
1998 2    3   4

And once I add for loop:

    for (name in v){
      Cashflow%>%
      group_by(Y)%>%
      summarise(mean = mean(get(name),na.rm = TRUE),
                observation = n(),
                sd = sd(get(name),na.rm = TRUE),

I get the error:

summarise() ungrouping output (override with .groups argument)

Could someone give me some advice on this?

Answer 1

If we want to do this for multiple columns, use across instead of get (and get returns only the value of the first column)

library(dplyr)
Cashflow %>%
   group_by(Y)%>%
   summarise(across(v,  
                     list(mean = ~ mean(., na.rm = TRUE),
                           sd = ~ sd(., na.rm = TRUE),
                            min = ~ min(., na.rm = TRUE),
                               median = ~ median(., na.rm = TRUE),
                               q25 = ~ quantile(., probs = 0.25, na.rm = TRUE),
                               q75 = ~ quantile(., probs = 0.75, na.rm = TRUE))),
        observation = n(), .groups = 'drop')

Using a reproducible example

data(mtcars)
v <- names(mtcars)[c(1, 3:7)]
mtcars %>% 
   group_by(gear) %>%
   summarise(across(v,    list(mean = ~ mean(., na.rm = TRUE),
                            sd = ~ sd(., na.rm = TRUE),
                             min = ~ min(., na.rm = TRUE),
                                median = ~ median(., na.rm = TRUE),
                                q25 = ~ quantile(., probs = 0.25, na.rm = TRUE),
                                q75 = ~ quantile(., probs = 0.75, na.rm = TRUE))),
         observation = n(), .groups = 'drop')
# A tibble: 3 x 39
#  gear mpg_mean mpg_sd mpg_min mpg_median mpg_q25 mpg_q75 disp_mean disp_sd disp_min disp_median disp_q25 disp_q75 hp_mean hp_sd
#  <dbl>    <dbl>  <dbl>   <dbl>      <dbl>   <dbl>   <dbl>     <dbl>   <dbl>    <dbl>       <dbl>    <dbl>    <dbl>   <dbl> <dbl>
#1     3     16.1   3.37    10.4       15.5    14.5    18.4      326.    94.9    120.         318     276.       380   176.   47.7
#2     4     24.5   5.28    17.8       22.8    21      28.1      123.    38.9     71.1        131.     78.9      160    89.5  25.9
#3     5     21.4   6.66    15         19.7    15.8    26        202.   115.      95.1        145     120.       301   196.  103. 
# … with 24 more variables: hp_min <dbl>, hp_median <dbl>, hp_q25 <dbl>, hp_q75 <dbl>, drat_mean <dbl>, drat_sd <dbl>,
#   drat_min <dbl>, drat_median <dbl>, drat_q25 <dbl>, drat_q75 <dbl>, wt_mean <dbl>, wt_sd <dbl>, wt_min <dbl>, wt_median <dbl>,
#   wt_q25 <dbl>, wt_q75 <dbl>, qsec_mean <dbl>, qsec_sd <dbl>, qsec_min <dbl>, qsec_median <dbl>, qsec_q25 <dbl>, qsec_q75 <dbl>,
#   observation <int>

R use group_by and summarise for 7 variables, but only get one result?

Question

1 answers

solution1
2 ACCPTED 2020-09-13 21:48:11

R use group_by and summarise for 7 variables, but only get one result?

Question

1 answers

solution1 2 ACCPTED 2020-09-13 21:48:11

solution1
2 ACCPTED 2020-09-13 21:48:11