[英]dplyr case_when with dynamic number of cases
Wanting to use dplyr and case_when
to collapse a series of indicator columns into a single column.想要使用 dplyr 和
case_when
将一系列指标列折叠成单个列。 The challenge is I want to be able to collapse over an unspecified/dynamic number of columns.挑战是我希望能够折叠未指定/动态数量的列。
Consider the following dataset, gear
has been split into a series of indicator columns.考虑以下数据集,
gear
已被分成一系列指标列。
library(dplyr)
data(mtcars)
mtcars = mtcars %>%
mutate(g2 = ifelse(gear == 2, 1, 0),
g3 = ifelse(gear == 3, 1, 0),
g4 = ifelse(gear == 4, 1, 0)) %>%
select(g2, g3, g4)
I am trying to write a function that does the reverse.我正在尝试编写一个相反的 function。
When I know how many cases this can be done as follows:当我知道有多少情况下可以这样做:
combine_indices = function(db, cols, vals){
db %>% mutate(new_col = case_when(!!sym(cols[1]) == 1 ~ vals[1],
!!sym(cols[2]) == 1 ~ vals[2],
!!sym(cols[3]) == 1 ~ vals[3]))
}
cols = c("g2", "g3", "g4")
vals = c(2,3,4)
combine_indices(mtcars, cols, vals)
However, I would like the combine_indices
function to handle any number of index columns (right now it works for exactly three).但是,我希望
combine_indices
function 能够处理任意数量的索引列(现在它正好适用于三个)。
According to the documentation ( ?case_when
), "if your patterns are stored in a list, you can splice that in with !!!
".根据文档(
?case_when
),“如果你的模式存储在一个列表中,你可以用!!!
拼接它”。 But I can not get this working:但我不能让这个工作:
patterns = list(sym(cols[1] == 1 ~ vals[1],
sym(cols[2] == 1 ~ vals[2],
sym(cols[3] == 1 ~ vals[3])
mtcars %>% mutate(new_col = case_when(!!!patterns))
Only produces a new column filled with NAs.仅生成一个填充有 NA 的新列。
If !!!patterns
worked, then it would be straightforward to take the lists cols
and vals
and generate patterns
.如果
!!!patterns
有效,那么获取列表cols
和vals
并生成patterns
将很简单。 However, I can not get the quosures correct.但是,我无法得到正确的说法。 Hoping someone more familiar with quosures knows how.
希望更熟悉quosures的人知道如何。
Note - some similar questions here of SO were solved using joins or other functions.注意 - 这里的一些类似的问题是使用连接或其他功能解决的。 However, I am restricted to using
case_when
because of how it translates to sql when using dbplyr.但是,我仅限于使用
case_when
,因为在使用 dbplyr 时它会转换为 sql。
We can create a string of conditions, use parse_exprs
and splice it ( !!!
).我们可以创建一串条件,使用
parse_exprs
并拼接它( !!!
)。
library(dplyr)
library(rlang)
combine_indices = function(db, cols, vals){
db %>% mutate(new_col = case_when(!!!parse_exprs(paste(cols, '== 1 ~', vals))))
}
cols = c("g2", "g3", "g4")
vals = c(2,3,4)
combine_indices(mtcars, cols, vals)
which returns:返回:
# g2 g3 g4 new_col
#1 0 0 1 4
#2 0 0 1 4
#3 0 0 1 4
#4 0 1 0 3
#5 0 1 0 3
#6 0 1 0 3
#....
where paste
generates the conditions for case_when
dynamically.其中
paste
动态生成case_when
的条件。
paste(cols, '== 1 ~', vals)
#[1] "g2 == 1 ~ 2" "g3 == 1 ~ 3" "g4 == 1 ~ 4"
This solution should create a column for any value in the gear column:此解决方案应为齿轮列中的任何值创建一个列:
data <- mtcars %>%
mutate(mygear = gear) %>%
pivot_wider(values_from = gear, names_from = gear, names_prefix = "g") %>%
mutate_at(vars(starts_with('g')), function(x) x/.$mygear) %>%
mutate_if(is.numeric , replace_na, replace = 0) %>%
rename(gear = mygear)
I do need to create a temporary column mygear
as pivot_wider
does not retain the pivot column.我确实需要创建一个临时列
mygear
因为pivot_wider
不保留 pivot 列。
> data
# A tibble: 32 x 14
mpg cyl disp hp drat wt qsec vs am carb gear g4 g3 g5
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 21 6 160 110 3.9 2.62 16.5 0 1 4 4 1 0 0
2 21 6 160 110 3.9 2.88 17.0 0 1 4 4 1 0 0
3 22.8 4 108 93 3.85 2.32 18.6 1 1 1 4 1 0 0
4 21.4 6 258 110 3.08 3.22 19.4 1 0 1 3 0 1 0
5 18.7 8 360 175 3.15 3.44 17.0 0 0 2 3 0 1 0
6 18.1 6 225 105 2.76 3.46 20.2 1 0 1 3 0 1 0
7 14.3 8 360 245 3.21 3.57 15.8 0 0 4 3 0 1 0
8 24.4 4 147. 62 3.69 3.19 20 1 0 2 4 1 0 0
9 22.8 4 141. 95 3.92 3.15 22.9 1 0 2 4 1 0 0
10 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4 1 0 0
# … with 22 more rows
You need to make object in list unevaluated expression by expr()
to evaluate by case_when
.您需要通过
expr()
使 object 在列表未评估表达式中由case_when
评估。
To be honest I didn't understand it completely, but it is work.老实说,我没有完全理解它,但它是工作。
patterns <- list(expr(!!sym(cols[1]) == 1 ~ vals[1]),
expr(!!sym(cols[2]) == 1 ~ vals[2]),
expr(!!sym(cols[3]) == 1 ~ vals[3]))
OR more simply或者更简单地说
patterns <- exprs(!!sym(cols[1]) == 1 ~ vals[1],
!!sym(cols[2]) == 1 ~ vals[2],
!!sym(cols[3]) == 1 ~ vals[3])
mtcars %>% mutate(new_col = case_when(!!!patterns))
For the sake of completeness, for this particular use case only the result can be obtained using matrix multiplication:为了完整起见,对于这个特定的用例,只有使用矩阵乘法才能获得结果:
library(dplyr)
combine_indices = function(db, cols, vals){
db %>% mutate(new_col = as.matrix(db[, cols]) %*% vals)
}
cols = c("g2", "g3", "g4")
vals = c(2, 3, 4)
combine_indices(mtcars, cols, vals)
g2 g3 g4 new_col 1 0 0 1 4 2 0 0 1 4 3 0 0 1 4 4 0 1 0 3 5 0 1 0 3 6 0 1 0 3 7 0 1 0 3 8 0 0 1 4 9 0 0 1 4 10 0 0 1 4 11 0 0 1 4 12 0 1 0 3 13 0 1 0 3 14 0 1 0 3 15 0 1 0 3 16 0 1 0 3 17 0 1 0 3 18 0 0 1 4 19 0 0 1 4 20 0 0 1 4 21 0 1 0 3 22 0 1 0 3 23 0 1 0 3 24 0 1 0 3 25 0 1 0 3 26 0 0 1 4 27 0 0 0 0 28 0 0 0 0 29 0 0 0 0 30 0 0 0 0 31 0 0 0 0 32 0 0 1 4
For row 1, we get对于第 1 行,我们得到
0 * 2 + 0 * 3 + 1 * 4 = 4
Perhaps I'm looking at it wrong, but I think this can be done more efficiently with a join:也许我看错了,但我认为这可以通过加入更有效地完成:
cols <- tibble(g2 = c(1, 0, 0), g3 = c(0, 1, 0), g4 = c(0, 0, 1), val = c(2, 3, 4))
cols
# # A tibble: 3 x 4
# g2 g3 g4 val
# <dbl> <dbl> <dbl> <dbl>
# 1 1 0 0 2
# 2 0 1 0 3
# 3 0 0 1 4
# using your mtcars
left_join(mtcars, cols, by = c("g2", "g3", "g4"))
# g2 g3 g4 val
# 1 0 0 1 4
# 2 0 0 1 4
# 3 0 0 1 4
# 4 0 1 0 3
# 5 0 1 0 3
# 6 0 1 0 3
# 7 0 1 0 3
# 8 0 0 1 4
# 9 0 0 1 4
# 10 0 0 1 4
# 11 0 0 1 4
# 12 0 1 0 3
# 13 0 1 0 3
# 14 0 1 0 3
# 15 0 1 0 3
# 16 0 1 0 3
# 17 0 1 0 3
# 18 0 0 1 4
# 19 0 0 1 4
# 20 0 0 1 4
# 21 0 1 0 3
# 22 0 1 0 3
# 23 0 1 0 3
# 24 0 1 0 3
# 25 0 1 0 3
# 26 0 0 1 4
# 27 0 0 0 NA
# 28 0 0 0 NA
# 29 0 0 0 NA
# 30 0 0 0 NA
# 31 0 0 0 NA
# 32 0 0 1 4
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.