I have 5 columns in which I'd like to group by a column and then summarize as mean per columns. However, in the process, I'd like to only calculate the mean for values between a certain range for all the columns. Is this possible? Not excluding the rows themselves but the values to be aggregated.
Current code:
a <- b %>% group_by(c) %>% summarise_all(funs(mean(., na.rm=T)))
If you want to use only a subset of data to compute the mean on, you can use a lambda function inside summarise()
.
However, if the subset is based on only one variable, you should simply use filter()
.
Also, note that summarise_all()
is retired and we should use summarise(across())
instead.
Here is an example where the mean is computed with only values included between 2
and 3
.
library(tidyverse)
iris %>%
group_by(Species) %>%
summarise(across(everything(), ~mean(.x, na.rm=TRUE)))
#> # A tibble: 3 x 5
#> Species Sepal.Length Sepal.Width Petal.Length Petal.Width
#> <fct> <dbl> <dbl> <dbl> <dbl>
#> 1 setosa 5.01 3.43 1.46 0.246
#> 2 versicolor 5.94 2.77 4.26 1.33
#> 3 virginica 6.59 2.97 5.55 2.03
my_range = c(inf=2, sup=3)
iris %>%
group_by(Species) %>%
summarise(across(everything(), ~.x[.x>my_range["inf"] & .x<my_range["sup"]] %>% mean(na.rm=TRUE)))
#> # A tibble: 3 x 5
#> Species Sepal.Length Sepal.Width Petal.Length Petal.Width
#> <fct> <dbl> <dbl> <dbl> <dbl>
#> 1 setosa NaN 2.60 NaN NaN
#> 2 versicolor NaN 2.63 NaN NaN
#> 3 virginica NaN 2.69 NaN 2.27
Created on 2021-05-12 by the reprex package (v2.0.0)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.