使用 dplyr mutate_at 通过 case_when 语句更改指定的变量列表

Question

I'm trying to recode some columns in a data set.我正在尝试重新编码数据集中的某些列。 The columns have a lot of weird names like S3__8 or C4__2.这些列有很多奇怪的名称，例如 S3__8 或 C4__2。 There are also some categorical columns I want to leave alone that start with C like Case.还有一些分类列我想不理会，它们以 C 开头，例如 Case。

I used this segment to successfully recode all of the S columns:我使用这个段成功地重新编码了所有的 S 列：

Sa_Recode <- Sa %>%
  mutate_at(vars(starts_with("S")),
    funs(case_when(grepl("Yes", ., ignore.case = TRUE) ~ "1",
                   grepl("No", ., ignore.case = TRUE) ~ "0",
                   grepl("Some", ., ignore.case = TRUE) ~ "0.5",
                   TRUE                                         ~ "Else")))

I want to recode the C columns, but can't use the same logic because some of my other columns start with C.我想重新编码 C 列，但不能使用相同的逻辑，因为我的其他一些列以 C 开头。 I've tried editing the mutate line like this with no luck:我试过像这样编辑 mutate 行，但没有运气：

Creating a list of the columns I need and making a list创建我需要的列的列表并制作列表

list <- c('C1_(*)__', 'C2_4__', 'C3_(*)__', 'C3a_(*)__') 
mutate_at(vars(list),

Listing them as variables将它们列为变量

mutate_at(c('C1_(*)__', 'C2_4__', 'C3_(*)__', 'C3a_(*)__'),

Listing them differently as variables以不同的方式将它们列为变量

mutate_at(vars(c('C1_(*)__', 'C2_4__', 'C3_(*)__', 'C3a_(*)__')),

Calling a range of columns调用一系列列

mutate_at(Sa[,8:53],

I'll be repeating this process with about nine other sets (with different starting letters) and am hoping to learn how to manipulate the logic.我将用其他大约九个集合（具有不同的起始字母）重复这个过程，并希望学习如何操作逻辑。 Alternatively, is there a way to make the "else" in the case statement not recode the value?或者，有没有办法让case语句中的“else”不重新编码值？ This could also fix the issue.这也可以解决问题。 Thanks!谢谢！

Sample Input:
Case  S25_    S26_(*)__   C1_(*)__
A     No      Some        Yes
B     Yes     Skipped     Yes
C     No      N/A         Some

Desired output:
Case  S25_    S26_(*)__   C1_(*)__
A     0       0.5         1
B     1       Skipped     1
C     0       N/A         0.5

Answer 1

You can use regular expressions to correctly identify columns that you want to change.您可以使用正则表达式来正确识别要更改的列。

library(dplyr)
Sa %>%
  mutate_at(vars(matches('^S|C\\d+')),
             ~case_when(grepl("Yes", ., ignore.case = TRUE) ~ "1",
                        grepl("No", ., ignore.case = TRUE) ~ "0",
                        grepl("Some", ., ignore.case = TRUE) ~ "0.5",
                        TRUE ~ "Else"))

This will select columns which start with "S" or which has "C" followed by a number.这将 select 列以"S"开头或"C"后跟数字。

Also mutate_at has been replaced with across so you can now use:此外mutate_at已被替换为 cross across因此您现在可以使用：

Sa %>%
   mutate(across(matches('^S|C\\d+'),
            ~case_when(grepl("Yes", ., ignore.case = TRUE) ~ "1",
                       grepl("No", ., ignore.case = TRUE) ~ "0",
                       grepl("Some", ., ignore.case = TRUE) ~ "0.5",
                       TRUE ~ "Else")))

使用 dplyr mutate_at 通过 case_when 语句更改指定的变量列表

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-07-17 14:41:01

使用 dplyr mutate_at 通过 case_when 语句更改指定的变量列表

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-07-17 14:41:01

解决方案1
1 已采纳 2020-07-17 14:41:01