简体   繁体   English

dplyr用case_when改变新的动态变量

[英]dplyr mutate new dynamic variables with case_when

I'm aware of similar questions here and here , but I haven't been able to figure out the right solution for my specific situation. 我在这里这里都知道类似的问题,但我无法找到适合我具体情况的正确解决方案。 Some of what I'm finding are solutions which use mutate_ , etc but I understand these are now obsolete. 我发现的一些是使用mutate_等的解决方案但我知道这些现在已经过时了。 I'm new to dynamic usages of dplyr. 我对dplyr的动态用法不熟悉。

I have a dataframe which includes some variables with two different prefixes, alpha and beta: 我有一个数据框,其中包含一些带有两个不同前缀的变量,alpha和beta:

df <- data.frame(alpha.num = c(1, 3, 5, 7),
             alpha.char = c("a", "c", "e", "g"),
             beta.num = c(2, 4, 6, 8),
             beta.char = c("b", "d", "f", "h"),
             which.to.use = c("alpha", "alpha", "beta", "beta"))

I want to create new variables with the prefix "chosen." 我想创建前缀为“selected”的新变量。 which are copies of either the "alpha" or "beta" columns depending on which is named for that row in the "which.to.use" column. 它们是“alpha”或“beta”列的副本,具体取决于在“which.to.use”列中为该行命名的列。 The desired output would be: 期望的输出是:

desired.df <- data.frame(alpha.num = c(1, 3, 5, 7),
                     alpha.char = c("a", "c", "e", "g"),
                     beta.num = c(2, 4, 6, 8),
                     beta.char = c("b", "d", "f", "h"),
                     which.to.use = c("alpha", "alpha", "beta", "beta"),
                     chosen.num = c(1, 3, 6, 8),
                     chosen.char = c("a", "c", "f", "h"))

My failed attempt: 我失败的尝试:

varnames <- c("num", "char")
df %<>%
  mutate(as.name(paste0("chosen.", varnames)) := case_when(
    which.to.use == "alpha" ~ paste0("alpha.", varnames),
    which.to.use == "beta" ~ pasteo("beta.", varnames)
  ))

I'd prefer a pure dplyr solution, and even better would be one which could be included in a longer pipe modifying the df (ie no need to stop to create "varnames"). 我更喜欢纯粹的dplyr解决方案,更好的是可以包含在修改df的更长管道中(即不需要停止创建“varnames”)。 Thanks for your help. 谢谢你的帮助。

Using some fun rlang stuff & purrr : 使用一些有趣的rlang东西& purrr

library(rlang)
library(purrr)
library(dplyr)

df <- data.frame(alpha.num = c(1, 3, 5, 7),
                 alpha.char = c("a", "c", "e", "g"),
                 beta.num = c(2, 4, 6, 8),
                 beta.char = c("b", "d", "f", "h"),
                 which.to.use = c("alpha", "alpha", "beta", "beta"),
                 stringsAsFactors = F)

c("num", "char") %>% 
    map(~ mutate(df, !!sym(paste0("chosen.", .x)) := 
      case_when(
          which.to.use == "alpha" ~ !!sym(paste0("alpha.", .x)),
          which.to.use == "beta" ~ !!sym(paste0("beta.", .x))
                ))) %>% 
    reduce(full_join)

Result: 结果:

  alpha.num alpha.char beta.num beta.char which.to.use chosen.num chosen.char
1         1          a        2         b        alpha          1           a
2         3          c        4         d        alpha          3           c
3         5          e        6         f         beta          6           f
4         7          g        8         h         beta          8           h

Without reduce(full_join) : 没有reduce(full_join)

c("num", "char") %>% 
  map_dfc(~ mutate(df, !!sym(paste0("chosen.", .x)) := 
                 case_when(
                   which.to.use == "alpha" ~ !!sym(paste0("alpha.", .x)),
                   which.to.use == "beta" ~ !!sym(paste0("beta.", .x))
                 ))) %>% 
  select(-ends_with("1"))



alpha.num alpha.char beta.num beta.char which.to.use chosen.num chosen.char
1         1          a        2         b        alpha          1           a
2         3          c        4         d        alpha          3           c
3         5          e        6         f         beta          6           f
4         7          g        8         h         beta          8           h

Explanation: 说明:
(Note: I do not fully or even kind of get rlang . Maybe others can give a better explanation ;).) (注意:我没有完全或甚至没有得到rlang 。也许其他人可以给出更好的解释;)。)

Using paste0 by itself produces a string, when we need a bare name for mutate to know it is referring to a variable name. 当我们需要mutate一个裸名称来知道它是指一个变量名时,使用paste0本身会产生一个字符串。

If we wrap paste0 in sym , it evaluates to a bare name: 如果我们在sym包装paste0 ,它将计算为一个裸名称:

> x <- varrnames[1]
> sym(paste0("alpha.", x))
  alpha.num

But mutate does not know to evaluate and instead read it as a symbol: mutate不知道要评估,而是将其作为符号读取:

> typeof(sym(paste0("alpha.", x)))
[1] "symbol"

The "bang bang" !! “砰砰” !! operator evaluates the sym function. 运算符评估sym函数。 Compare: 相比:

> expr(mutate(df, var = sym(paste0("alpha.", x))))
mutate(df, var = sym(paste0("alpha.", x)))

> expr(mutate(df, var = !!sym(paste0("alpha.", x))))
mutate(df, var = alpha.num)

So with !!sym we can use paste to dynamically called variable names with dplyr. 所以使用!!sym我们可以使用paste来动态调用dplyr的变量名。

A base R approach using apply with margin = 1 where we select columns for each row based on the value in which.to.use column and get the value from corresponding column for the row. 使用A基础R的方法apply具有margin = 1 ,我们选择列,用于基于在所述值的每一行which.to.use柱并从该行对应的列得到的值。

df[c("chosen.num", "chosen.char")] <- 
          t(apply(df, 1, function(x) x[grepl(x["which.to.use"], names(df))]))

df
#  alpha.num alpha.char beta.num beta.char which.to.use chosen.num chosen.char
#1         1          a        2         b        alpha          1           a
#2         3          c        4         d        alpha          3           c
#3         5          e        6         f         beta          6           f
#4         7          g        8         h         beta          8           h

This is a nest()/map() strategy that should be pretty fast. 这是一个非常快的nest()/map()策略。 It stays in the tidyverse , but doesn't go into rlang land. 它保持在tidyverse ,但不进入rlang土地。

library(tidyverse)

df %>% 
    nest(-which.to.use) %>%
    mutate(new_data = map2(data, which.to.use,
                       ~ select(..1, matches(..2)) %>%
                           rename_all(funs(gsub(".*\\.", "choosen.", .) )))) %>%
    unnest()

  which.to.use alpha.num alpha.char beta.num beta.char choosen.num choosen.char
1        alpha         1          a        2         b           1            a
2        alpha         3          c        4         d           3            c
3         beta         5          e        6         f           6            f
4         beta         7          g        8         h           8            h

It grabs all columns, not just num and char , that are not which.to.use . 它抓取所有列,而不仅仅是numchar ,而不是which.to.use But that seems like what you (I) would want IRL. 但这似乎是你(我)想要的IRL。 You could add a select(matches('(var1|var2|etc')) line before you call nest() if you wanted to pull only specific variables. 如果只想提取特定变量,可以在调用nest()之前添加一个select(matches('(var1|var2|etc'))行。

EDIT: My original suggestion of using select() to drop unneeded columns would result in doing a join to bring them back later. 编辑:我使用select()删除不需要的列的原始建议将导致进行join以便稍后将其恢复。 If instead you adjust the nest parameters, you can acheive this on only certain columns. 相反,如果您调整nest参数,则只能在某些列上实现此操作。

I added new bool columns here, but they will be ignored for the "choosen" selection: 我在这里添加了新的bool列,但是对于“选择”选择它们将被忽略:

new_df <- data.frame(alpha.num = c(1, 3, 5, 7),
                 alpha.char = c("a", "c", "e", "g"),
                 alpha.bool = FALSE,
                 beta.num = c(2, 4, 6, 8),
                 beta.char = c("b", "d", "f", "h"),
                 beta.bool = TRUE,
                 which.to.use = c("alpha", "alpha", "beta", "beta"),
                 stringsAsFactors = FALSE)

new_df %>% 
    nest(matches("num|char")) %>% # only columns that match this pattern get nested, allows you to save others
    mutate(new_data = map2(data, which.to.use,
                           ~ select(..1, matches(..2)) %>%
                               rename_all(funs(gsub(".*\\.", "choosen.", .) )))) %>%
    unnest()

  alpha.bool beta.bool which.to.use alpha.num alpha.char beta.num beta.char choosen.num choosen.char
1      FALSE      TRUE        alpha         1          a        2         b           1            a
2      FALSE      TRUE        alpha         3          c        4         d           3            c
3      FALSE      TRUE         beta         5          e        6         f           6            f
4      FALSE      TRUE         beta         7          g        8         h           8            h

You can also try a gather / spread approach 您也可以尝试gather / spread方法

df %>% 
  rownames_to_column() %>% 
  gather(k,v,-which.to.use,-rowname) %>% 
  separate(k,into = c("k1", "k2"), sep="[.]") %>% 
  filter(which.to.use == k1) %>% 
  mutate(k1="chosen") %>% 
  unite(k, k1, k2,sep=".") %>% 
  spread(k,v) %>%
  select(.,chosen.num, chosen.char) %>% 
  bind_cols(df, .)
    alpha.num alpha.char beta.num beta.char which.to.use chosen.num chosen.char
 1         1          a        2         b        alpha          1           a
 2         3          c        4         d        alpha          3           c
 3         5          e        6         f         beta          6           f
 4         7          g        8         h         beta          8           h

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM