Can I group in a loop in the tidyverse?
The bigger task is to replace a grouping variable with NA
if there are few observations in the group. I want to consolidate small groups into an NA
group.
However, the code below won't let me group_by(x)
where x
is the looping variable.
library(tidyverse)
for (x in c("cyl", "gear")) {
mtcars %>%
add_count(x) %>%
mutate(x = ifelse(n() < 10, NA, x))
}
I receive the following error.
Error in grouped_df_impl(data, unname(vars), drop) :
Column `x` is unknown
Do you mean something like this?
library(dplyr)
for (x in c("cyl", "gear")) {
col <- sym(x)
mtcars <- mtcars %>%
add_count(!!col) %>%
mutate(!!col := ifelse(n < 10, NA, !!col)) %>%
select(-n)
}
mtcars
#> # A tibble: 32 x 11
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 21 NA 160 110 3.9 2.62 16.5 0 1 4 4
#> 2 21 NA 160 110 3.9 2.88 17.0 0 1 4 4
#> 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
#> 4 21.4 NA 258 110 3.08 3.22 19.4 1 0 3 1
#> 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
#> 6 18.1 NA 225 105 2.76 3.46 20.2 1 0 3 1
#> 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
#> 8 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
#> 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
#> 10 19.2 NA 168. 123 3.92 3.44 18.3 1 0 4 4
#> # ... with 22 more rows
Created on 2018-12-08 by the reprex package (v0.2.1)
(Not the easiest syntax, I know....)
You could also use mutate_at
with table
library(tidyverse)
mtcars %>%
mutate_at(vars(cyl, gear), ~ {
t <- table(.)
ifelse(. %in% names(t[t < 10]), NA, .)})
The function can be simplified to one line with purrr::keep
mtcars %>%
mutate_at(vars(cyl, gear),
~ ifelse(. %in% names(keep(table(.), `<`, 10)), NA, .))
Or if you happen to be working with a data.table, you can use an "update join" to subset to groups with low counts, then assign NA
to that subset
library(data.table)
dt <- as.data.table(mtcars)
for(x in c('cyl', 'gear'))
dt[dt[, .N, x][N < 10], on = x, (x) := NA]
This will achieve the same result
all.equal(
dt,
mtcars %>%
mutate_at(vars(cyl, gear),
~ ifelse(. %in% names(keep(table(.), `<`, 10)), NA, .)) %>%
setDT
)
# [1] TRUE
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.