简体   繁体   中英

Extract min/max by group in R

(Using Iris for reproducibility)

I want to calculate min/max row by Petal.Width & grouped by Species in R. I have done that using two approaches, I want to understand is there a better approach (preferably tidyverse) , also note because of ties answer might vary in both. Please correct if there is anything wrong in both these approaches.

Approach 1

library(tidyverse)

iris %>% 
 group_by(Species) %>% 
  slice_max(Petal.Width, n = 1, with_ties=FALSE) %>% 
  rbind(
iris %>% 
 group_by(Species) %>% 
  slice_min(Petal.Width, n = 1, with_ties=FALSE)) 

Approach 2

iris %>% 
  group_by(Species) %>% 
  arrange(Petal.Width) %>% 
  filter(row_number() %in% c(1,n()))

Here is a the way to do it with summarise(across()) :

library(dplyr)
iris %>%
  group_by(Species) %>%
  summarise(across(.cols = Petal.Width, 
                   .fns = list(min = min, max = max), 
                   .names = "{col}_{fn}"))

`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 3 x 3
  Species    Petal.Width_min Petal.Width_max
  <fct>                <dbl>           <dbl>
1 setosa                 0.1             0.6
2 versicolor             1               1.8
3 virginica              1.4             2.5

You could easily find the min and max of every numerical variable in a data set this way:

iris %>%
  group_by(Species) %>%
  summarise(across(where(is.numeric), 
                   .fns = list(min = min, max = max), 
                   .names = "{col}_{fn}"))
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 3 x 9
  Species    Sepal.Length_min Sepal.Length_max Sepal.Width_min Sepal.Width_max Petal.Length_min Petal.Length_max Petal.Width_min Petal.Width_max
  <fct>                 <dbl>            <dbl>           <dbl>           <dbl>            <dbl>            <dbl>           <dbl>           <dbl>
1 setosa                  4.3              5.8             2.3             4.4              1                1.9             0.1             0.6
2 versicolor              4.9              7               2               3.4              3                5.1             1               1.8
3 virginica               4.9              7.9             2.2             3.8              4.5              6.9             1.4             2.5

Using aggregate .

aggregate(Petal.Width ~ Species, iris, function(x) c(min=min(x), max=max(x)))
#      Species Petal.Width.min Petal.Width.max
# 1     setosa             0.1             0.6
# 2 versicolor             1.0             1.8
# 3  virginica             1.4             2.5

You could also use slice like below:

iris %>%
  group_by(Species) %>%
  slice(which.min(Petal.Width),
        which.max(Petal.Width))

Output:

# A tibble: 6 x 5
# Groups:   Species [3]
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species   
         <dbl>       <dbl>        <dbl>       <dbl> <fct>     
1          5           3.5          1.6         0.6 setosa    
2          5.9         3.2          4.8         1.8 versicolor
3          6.3         3.3          6           2.5 virginica 
4          4.9         3.1          1.5         0.1 setosa    
5          4.9         2.4          3.3         1   versicolor
6          6.1         2.6          5.6         1.4 virginica 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM