R：基于字符串替换列值的有效方法（可能使用 case_when 或某种其他形式的 mutate）？

Question

Is there a more efficient way to replace values in columns based on searching for strings within them?是否有更有效的方法来根据在列中搜索字符串来替换列中的值？

prac <- data.frame(`External Placement` = c(NA, NA, "Spanish words We place outside of our company", NA, NA, "Spanish words We place outside of our company", "Spanish words We place outside of our company", "Spanish words We place outside of our company", NA, "Spanish words We place outside of our company"),
             `Internal Placement` = c(NA, NA, "Spanish words we place inside of our organisation", NA, "Spanish words we place inside of our organisation", "Spanish words we place inside of our organisation", NA, NA, NA, NA),
             `None of the above` = c("Ninguno none", "Ninguno none", NA, NA, NA, NA, NA, NA, NA, NA))

This is the sample data.这是样本数据。 I have used a series of if/else to get the values I want, but I tried for a long time to figure out a way to do so with mutate, case_when, and so forth to do less repetition.我使用了一系列 if/else 来获取我想要的值，但我尝试了很长时间来找出一种方法来使用 mutate、case_when 等来减少重复。 Is there a better way?有没有更好的办法？ This is what I did so far:这是我到目前为止所做的：

prac$`Vocational Training` <- ifelse(grepl("vocational training$", prac$`Vocational Training`),
                                 "Vocational training", NA)
prac$`External Placement` <- ifelse(grepl("outside of our company$", prac$`External Placement`),
                                    "External placement", NA)
prac$`Internal Placement` <- ifelse(grepl("inside our organisation$", prac$`Internal Placement`),
                                    "Internal placement", NA)  
prac$`None of the Above` <- ifelse(grepl("^Ninguno", prac$`None of the Above`), "None or other", NA)

Answer 1

Maybe you can try the code like below也许你可以试试下面的代码

v <- c("outside of our company$","inside our organisation$","^Ninguno")
r <- c("External placement","Internal placement","None or other")

prac[]<- mapply(function(x,y) ifelse(x,y,NA),
                data.frame(mapply(grepl, v,prac)),
                r)

such that以至于

> prac
   External.Placement Internal.Placement None.of.the.above
1                <NA>               <NA>     None or other
2                <NA>               <NA>     None or other
3  External placement               <NA>              <NA>
4                <NA>               <NA>              <NA>
5                <NA>               <NA>              <NA>
6  External placement               <NA>              <NA>
7  External placement               <NA>              <NA>
8  External placement               <NA>              <NA>
9                <NA>               <NA>              <NA>
10 External placement               <NA>              <NA>

DATA数据

prac <- structure(list(External.Placement = structure(c(NA, NA, 1L, NA, 
NA, 1L, 1L, 1L, NA, 1L), .Label = "Spanish words We place outside of our company", class = "factor"), 
    Internal.Placement = structure(c(NA, NA, 1L, NA, 1L, 1L, 
    NA, NA, NA, NA), .Label = "Spanish words we place inside of our organisation", class = "factor"), 
    None.of.the.above = structure(c(1L, 1L, NA, NA, NA, NA, NA, 
    NA, NA, NA), .Label = "Ninguno none", class = "factor")), class = "data.frame", row.names = c(NA, 
-10L))

Answer 2

It depends if the efficiency you want is computing efficiency (time to compute) or programming efficiency (number of lines used)这取决于你想要的效率是计算效率（计算时间）还是编程效率（使用的行数）

In terms of programming efficiency, it will be very hard to beat a data.table solution.在编程效率方面，很难击败data.table解决方案。 I can propose you this one:我可以向你推荐这个：

prac <- data.frame(`External Placement` = c(NA, NA, "Spanish words We place outside of our company", NA, NA, "Spanish words We place outside of our company", "Spanish words We place outside of our company", "Spanish words We place outside of our company", NA, "Spanish words We place outside of our company"),
                   `Internal Placement` = c(NA, NA, "Spanish words we place inside of our organisation", NA, "Spanish words we place inside of our organisation", "Spanish words we place inside of our organisation", NA, NA, NA, NA),
                   `None of the Above` = c("Ninguno none", "Ninguno none", NA, NA, NA, NA, NA, NA, NA, NA))

library(data.table)

list_conditions <- c(#"Vocational.Training" = "vocational training$",
  "External.Placement" = "outside of our company$",
  "Internal.Placement" = "inside our organisation$",
  "None.of.the.Above" = "^Ninguno")


dt <- data.table(prac)

My idea is to use a vector with conditions whose names are the variable for which it is supposed to apply.我的想法是使用一个带有条件的向量，其名称是它应该应用的变量。 Then a lapply that updates your dataframe by reference (here you lose some programming efficiency but you will have very good computing efficiency)然后是一个通过引用更新你的数据帧的lapply （这里你失去了一些编程效率，但你会有很好的计算效率）

dt <- lapply(seq_len(length(list_conditions)), function(i) {

  var <- names(list_conditions)[i]
  cond <- list_conditions[i]
  val <- gsub("\\.","", var)
  dt[, 'tempcol' := NA_character_]
  dt[grepl(as.character(cond), get(var)), tempcol := as.character(val)]
  dt[,c(var) := tempcol]

})
dt <- dt[[length(dt)]]
dt[,'tempcol' := NULL]

The line dt <- dt[[length(dt)]] is here because R returns a list but we are only interested in its last element (last update for the dataframe).行dt <- dt[[length(dt)]]在这里是因为 R 返回一个列表，但我们只对它的最后一个元素（数据帧的最后更新）感兴趣。 You can generalize this program if you prefer creating new columns rather than rewriting existing ones.如果您更喜欢创建新列而不是重写现有列，则可以推广此程序。

The output is:输出是：

dt
    External.Placement Internal.Placement None.of.the.Above
 1:               <NA>               <NA> None of the Above
 2:               <NA>               <NA> None of the Above
 3: External Placement               <NA>              <NA>
 4:               <NA>               <NA>              <NA>
 5:               <NA>               <NA>              <NA>
 6: External Placement               <NA>              <NA>
 7: External Placement               <NA>              <NA>
 8: External Placement               <NA>              <NA>
 9:               <NA>               <NA>              <NA>
10: External Placement               <NA>              <NA>

R：基于字符串替换列值的有效方法（可能使用 case_when 或某种其他形式的 mutate）？

问题描述

2 个解决方案

解决方案1
0 2020-04-02 10:09:16

解决方案2
0 2020-04-02 10:15:30

R：基于字符串替换列值的有效方法（可能使用 case_when 或某种其他形式的 mutate）？

问题描述

2 个解决方案

解决方案1 0 2020-04-02 10:09:16

解决方案2 0 2020-04-02 10:15:30

解决方案1
0 2020-04-02 10:09:16

解决方案2
0 2020-04-02 10:15:30