[英]dplyr mutate new dynamic variables with case_when
I'm aware of similar questions here and here , but I haven't been able to figure out the right solution for my specific situation. 我在这里和这里都知道类似的问题,但我无法找到适合我具体情况的正确解决方案。 Some of what I'm finding are solutions which use mutate_
, etc but I understand these are now obsolete. 我发现的一些是使用mutate_
等的解决方案但我知道这些现在已经过时了。 I'm new to dynamic usages of dplyr. 我对dplyr的动态用法不熟悉。
I have a dataframe which includes some variables with two different prefixes, alpha and beta: 我有一个数据框,其中包含一些带有两个不同前缀的变量,alpha和beta:
df <- data.frame(alpha.num = c(1, 3, 5, 7),
alpha.char = c("a", "c", "e", "g"),
beta.num = c(2, 4, 6, 8),
beta.char = c("b", "d", "f", "h"),
which.to.use = c("alpha", "alpha", "beta", "beta"))
I want to create new variables with the prefix "chosen." 我想创建前缀为“selected”的新变量。 which are copies of either the "alpha" or "beta" columns depending on which is named for that row in the "which.to.use" column. 它们是“alpha”或“beta”列的副本,具体取决于在“which.to.use”列中为该行命名的列。 The desired output would be: 期望的输出是:
desired.df <- data.frame(alpha.num = c(1, 3, 5, 7),
alpha.char = c("a", "c", "e", "g"),
beta.num = c(2, 4, 6, 8),
beta.char = c("b", "d", "f", "h"),
which.to.use = c("alpha", "alpha", "beta", "beta"),
chosen.num = c(1, 3, 6, 8),
chosen.char = c("a", "c", "f", "h"))
My failed attempt: 我失败的尝试:
varnames <- c("num", "char")
df %<>%
mutate(as.name(paste0("chosen.", varnames)) := case_when(
which.to.use == "alpha" ~ paste0("alpha.", varnames),
which.to.use == "beta" ~ pasteo("beta.", varnames)
))
I'd prefer a pure dplyr solution, and even better would be one which could be included in a longer pipe modifying the df (ie no need to stop to create "varnames"). 我更喜欢纯粹的dplyr解决方案,更好的是可以包含在修改df的更长管道中(即不需要停止创建“varnames”)。 Thanks for your help. 谢谢你的帮助。
Using some fun rlang
stuff & purrr
: 使用一些有趣的rlang
东西& purrr
:
library(rlang)
library(purrr)
library(dplyr)
df <- data.frame(alpha.num = c(1, 3, 5, 7),
alpha.char = c("a", "c", "e", "g"),
beta.num = c(2, 4, 6, 8),
beta.char = c("b", "d", "f", "h"),
which.to.use = c("alpha", "alpha", "beta", "beta"),
stringsAsFactors = F)
c("num", "char") %>%
map(~ mutate(df, !!sym(paste0("chosen.", .x)) :=
case_when(
which.to.use == "alpha" ~ !!sym(paste0("alpha.", .x)),
which.to.use == "beta" ~ !!sym(paste0("beta.", .x))
))) %>%
reduce(full_join)
Result: 结果:
alpha.num alpha.char beta.num beta.char which.to.use chosen.num chosen.char
1 1 a 2 b alpha 1 a
2 3 c 4 d alpha 3 c
3 5 e 6 f beta 6 f
4 7 g 8 h beta 8 h
Without reduce(full_join)
: 没有reduce(full_join)
:
c("num", "char") %>%
map_dfc(~ mutate(df, !!sym(paste0("chosen.", .x)) :=
case_when(
which.to.use == "alpha" ~ !!sym(paste0("alpha.", .x)),
which.to.use == "beta" ~ !!sym(paste0("beta.", .x))
))) %>%
select(-ends_with("1"))
alpha.num alpha.char beta.num beta.char which.to.use chosen.num chosen.char
1 1 a 2 b alpha 1 a
2 3 c 4 d alpha 3 c
3 5 e 6 f beta 6 f
4 7 g 8 h beta 8 h
Explanation: 说明:
(Note: I do not fully or even kind of get rlang
. Maybe others can give a better explanation ;).) (注意:我没有完全或甚至没有得到rlang
。也许其他人可以给出更好的解释;)。)
Using paste0
by itself produces a string, when we need a bare name for mutate
to know it is referring to a variable name. 当我们需要mutate
一个裸名称来知道它是指一个变量名时,使用paste0
本身会产生一个字符串。
If we wrap paste0
in sym
, it evaluates to a bare name: 如果我们在sym
包装paste0
,它将计算为一个裸名称:
> x <- varrnames[1]
> sym(paste0("alpha.", x))
alpha.num
But mutate
does not know to evaluate and instead read it as a symbol: 但mutate
不知道要评估,而是将其作为符号读取:
> typeof(sym(paste0("alpha.", x)))
[1] "symbol"
The "bang bang" !!
“砰砰” !!
operator evaluates the sym
function. 运算符评估sym
函数。 Compare: 相比:
> expr(mutate(df, var = sym(paste0("alpha.", x))))
mutate(df, var = sym(paste0("alpha.", x)))
> expr(mutate(df, var = !!sym(paste0("alpha.", x))))
mutate(df, var = alpha.num)
So with !!sym
we can use paste to dynamically called variable names with dplyr. 所以使用!!sym
我们可以使用paste来动态调用dplyr的变量名。
A base R approach using apply
with margin = 1
where we select columns for each row based on the value in which.to.use
column and get the value from corresponding column for the row. 使用A基础R的方法apply
具有margin = 1
,我们选择列,用于基于在所述值的每一行which.to.use
柱并从该行对应的列得到的值。
df[c("chosen.num", "chosen.char")] <-
t(apply(df, 1, function(x) x[grepl(x["which.to.use"], names(df))]))
df
# alpha.num alpha.char beta.num beta.char which.to.use chosen.num chosen.char
#1 1 a 2 b alpha 1 a
#2 3 c 4 d alpha 3 c
#3 5 e 6 f beta 6 f
#4 7 g 8 h beta 8 h
This is a nest()/map()
strategy that should be pretty fast. 这是一个非常快的nest()/map()
策略。 It stays in the tidyverse
, but doesn't go into rlang
land. 它保持在tidyverse
,但不进入rlang
土地。
library(tidyverse)
df %>%
nest(-which.to.use) %>%
mutate(new_data = map2(data, which.to.use,
~ select(..1, matches(..2)) %>%
rename_all(funs(gsub(".*\\.", "choosen.", .) )))) %>%
unnest()
which.to.use alpha.num alpha.char beta.num beta.char choosen.num choosen.char
1 alpha 1 a 2 b 1 a
2 alpha 3 c 4 d 3 c
3 beta 5 e 6 f 6 f
4 beta 7 g 8 h 8 h
It grabs all columns, not just num
and char
, that are not which.to.use
. 它抓取所有列,而不仅仅是num
和char
,而不是which.to.use
。 But that seems like what you (I) would want IRL. 但这似乎是你(我)想要的IRL。 You could add a select(matches('(var1|var2|etc'))
line before you call nest()
if you wanted to pull only specific variables. 如果只想提取特定变量,可以在调用nest()
之前添加一个select(matches('(var1|var2|etc'))
行。
EDIT: My original suggestion of using select()
to drop unneeded columns would result in doing a join
to bring them back later. 编辑:我使用select()
删除不需要的列的原始建议将导致进行join
以便稍后将其恢复。 If instead you adjust the nest
parameters, you can acheive this on only certain columns. 相反,如果您调整nest
参数,则只能在某些列上实现此操作。
I added new bool
columns here, but they will be ignored for the "choosen" selection: 我在这里添加了新的bool
列,但是对于“选择”选择它们将被忽略:
new_df <- data.frame(alpha.num = c(1, 3, 5, 7),
alpha.char = c("a", "c", "e", "g"),
alpha.bool = FALSE,
beta.num = c(2, 4, 6, 8),
beta.char = c("b", "d", "f", "h"),
beta.bool = TRUE,
which.to.use = c("alpha", "alpha", "beta", "beta"),
stringsAsFactors = FALSE)
new_df %>%
nest(matches("num|char")) %>% # only columns that match this pattern get nested, allows you to save others
mutate(new_data = map2(data, which.to.use,
~ select(..1, matches(..2)) %>%
rename_all(funs(gsub(".*\\.", "choosen.", .) )))) %>%
unnest()
alpha.bool beta.bool which.to.use alpha.num alpha.char beta.num beta.char choosen.num choosen.char
1 FALSE TRUE alpha 1 a 2 b 1 a
2 FALSE TRUE alpha 3 c 4 d 3 c
3 FALSE TRUE beta 5 e 6 f 6 f
4 FALSE TRUE beta 7 g 8 h 8 h
You can also try a gather
/ spread
approach 您也可以尝试gather
/ spread
方法
df %>%
rownames_to_column() %>%
gather(k,v,-which.to.use,-rowname) %>%
separate(k,into = c("k1", "k2"), sep="[.]") %>%
filter(which.to.use == k1) %>%
mutate(k1="chosen") %>%
unite(k, k1, k2,sep=".") %>%
spread(k,v) %>%
select(.,chosen.num, chosen.char) %>%
bind_cols(df, .)
alpha.num alpha.char beta.num beta.char which.to.use chosen.num chosen.char
1 1 a 2 b alpha 1 a
2 3 c 4 d alpha 3 c
3 5 e 6 f beta 6 f
4 7 g 8 h beta 8 h
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.