简体   繁体   中英

R use group_by and summarise for 7 variables, but only get one result?

I have a large dataset and I group the dataset by year and select 7 variables, then I use summarise, trying to get statistics of each variable by group. but I only get statistics of each group, not for each variable. How could I interpret the results? How could I get results of each variable?

v<-colnames(Cashflow)[c(2,4:ncol(Cashflow))]
Cstats<-Cashflow%>%
  group_by(Y)%>%
  summarise(mean = mean(get(v),na.rm = TRUE),
            observation = n(),
            sd = sd(get(v),na.rm = TRUE),
            min = min(get(v),na.rm = TRUE),
            q25 = quantile(get(v),probs = c(0.25),na.rm = TRUE),
            median = median(get(v),na.rm = TRUE),
            q75 = quantile(get(v),probs = c(0.75),na.rm = TRUE),
            max = max(get(v),na.rm = TRUE))```

And my results is like this:

year mean sd min
1997 1    2   3
1998 2    3   4

And once I add for loop:

    for (name in v){
      Cashflow%>%
      group_by(Y)%>%
      summarise(mean = mean(get(name),na.rm = TRUE),
                observation = n(),
                sd = sd(get(name),na.rm = TRUE),

I get the error:

summarise() ungrouping output (override with .groups argument)

summarise() ungrouping output (override with .groups argument)

summarise() ungrouping output (override with .groups argument)

Could someone give me some advice on this?

If we want to do this for multiple columns, use across instead of get (and get returns only the value of the first column)

library(dplyr)
Cashflow %>%
   group_by(Y)%>%
   summarise(across(v,  
                     list(mean = ~ mean(., na.rm = TRUE),
                           sd = ~ sd(., na.rm = TRUE),
                            min = ~ min(., na.rm = TRUE),
                               median = ~ median(., na.rm = TRUE),
                               q25 = ~ quantile(., probs = 0.25, na.rm = TRUE),
                               q75 = ~ quantile(., probs = 0.75, na.rm = TRUE))),
        observation = n(), .groups = 'drop')  

Using a reproducible example

data(mtcars)
v <- names(mtcars)[c(1, 3:7)]
mtcars %>% 
   group_by(gear) %>%
   summarise(across(v,    list(mean = ~ mean(., na.rm = TRUE),
                            sd = ~ sd(., na.rm = TRUE),
                             min = ~ min(., na.rm = TRUE),
                                median = ~ median(., na.rm = TRUE),
                                q25 = ~ quantile(., probs = 0.25, na.rm = TRUE),
                                q75 = ~ quantile(., probs = 0.75, na.rm = TRUE))),
         observation = n(), .groups = 'drop')
# A tibble: 3 x 39
#  gear mpg_mean mpg_sd mpg_min mpg_median mpg_q25 mpg_q75 disp_mean disp_sd disp_min disp_median disp_q25 disp_q75 hp_mean hp_sd
#  <dbl>    <dbl>  <dbl>   <dbl>      <dbl>   <dbl>   <dbl>     <dbl>   <dbl>    <dbl>       <dbl>    <dbl>    <dbl>   <dbl> <dbl>
#1     3     16.1   3.37    10.4       15.5    14.5    18.4      326.    94.9    120.         318     276.       380   176.   47.7
#2     4     24.5   5.28    17.8       22.8    21      28.1      123.    38.9     71.1        131.     78.9      160    89.5  25.9
#3     5     21.4   6.66    15         19.7    15.8    26        202.   115.      95.1        145     120.       301   196.  103. 
# … with 24 more variables: hp_min <dbl>, hp_median <dbl>, hp_q25 <dbl>, hp_q75 <dbl>, drat_mean <dbl>, drat_sd <dbl>,
#   drat_min <dbl>, drat_median <dbl>, drat_q25 <dbl>, drat_q75 <dbl>, wt_mean <dbl>, wt_sd <dbl>, wt_min <dbl>, wt_median <dbl>,
#   wt_q25 <dbl>, wt_q75 <dbl>, qsec_mean <dbl>, qsec_sd <dbl>, qsec_min <dbl>, qsec_median <dbl>, qsec_q25 <dbl>, qsec_q75 <dbl>,
#   observation <int>


                        

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM