简体   繁体   English

R:基于字符串替换列值的有效方法(可能使用 case_when 或某种其他形式的 mutate)?

[英]R: Efficient way to replace column values based on strings (maybe with case_when or some other form of mutate)?

Is there a more efficient way to replace values in columns based on searching for strings within them?是否有更有效的方法来根据在列中搜索字符串来替换列中的值?

prac <- data.frame(`External Placement` = c(NA, NA, "Spanish words We place outside of our company", NA, NA, "Spanish words We place outside of our company", "Spanish words We place outside of our company", "Spanish words We place outside of our company", NA, "Spanish words We place outside of our company"),
             `Internal Placement` = c(NA, NA, "Spanish words we place inside of our organisation", NA, "Spanish words we place inside of our organisation", "Spanish words we place inside of our organisation", NA, NA, NA, NA),
             `None of the above` = c("Ninguno none", "Ninguno none", NA, NA, NA, NA, NA, NA, NA, NA))

This is the sample data.这是样本数据。 I have used a series of if/else to get the values I want, but I tried for a long time to figure out a way to do so with mutate, case_when, and so forth to do less repetition.我使用了一系列 if/else 来获取我想要的值,但我尝试了很长时间来找出一种方法来使用 mutate、case_when 等来减少重复。 Is there a better way?有没有更好的办法? This is what I did so far:这是我到目前为止所做的:

prac$`Vocational Training` <- ifelse(grepl("vocational training$", prac$`Vocational Training`),
                                 "Vocational training", NA)
prac$`External Placement` <- ifelse(grepl("outside of our company$", prac$`External Placement`),
                                    "External placement", NA)
prac$`Internal Placement` <- ifelse(grepl("inside our organisation$", prac$`Internal Placement`),
                                    "Internal placement", NA)  
prac$`None of the Above` <- ifelse(grepl("^Ninguno", prac$`None of the Above`), "None or other", NA)

Maybe you can try the code like below也许你可以试试下面的代码

v <- c("outside of our company$","inside our organisation$","^Ninguno")
r <- c("External placement","Internal placement","None or other")

prac[]<- mapply(function(x,y) ifelse(x,y,NA),
                data.frame(mapply(grepl, v,prac)),
                r)

such that以至于

> prac
   External.Placement Internal.Placement None.of.the.above
1                <NA>               <NA>     None or other
2                <NA>               <NA>     None or other
3  External placement               <NA>              <NA>
4                <NA>               <NA>              <NA>
5                <NA>               <NA>              <NA>
6  External placement               <NA>              <NA>
7  External placement               <NA>              <NA>
8  External placement               <NA>              <NA>
9                <NA>               <NA>              <NA>
10 External placement               <NA>              <NA>

DATA数据

prac <- structure(list(External.Placement = structure(c(NA, NA, 1L, NA, 
NA, 1L, 1L, 1L, NA, 1L), .Label = "Spanish words We place outside of our company", class = "factor"), 
    Internal.Placement = structure(c(NA, NA, 1L, NA, 1L, 1L, 
    NA, NA, NA, NA), .Label = "Spanish words we place inside of our organisation", class = "factor"), 
    None.of.the.above = structure(c(1L, 1L, NA, NA, NA, NA, NA, 
    NA, NA, NA), .Label = "Ninguno none", class = "factor")), class = "data.frame", row.names = c(NA, 
-10L))

It depends if the efficiency you want is computing efficiency (time to compute) or programming efficiency (number of lines used)这取决于你想要的效率是计算效率(计算时间)还是编程效率(使用的行数)

In terms of programming efficiency, it will be very hard to beat a data.table solution.在编程效率方面,很难击败data.table解决方案。 I can propose you this one:我可以向你推荐这个:

prac <- data.frame(`External Placement` = c(NA, NA, "Spanish words We place outside of our company", NA, NA, "Spanish words We place outside of our company", "Spanish words We place outside of our company", "Spanish words We place outside of our company", NA, "Spanish words We place outside of our company"),
                   `Internal Placement` = c(NA, NA, "Spanish words we place inside of our organisation", NA, "Spanish words we place inside of our organisation", "Spanish words we place inside of our organisation", NA, NA, NA, NA),
                   `None of the Above` = c("Ninguno none", "Ninguno none", NA, NA, NA, NA, NA, NA, NA, NA))

library(data.table)

list_conditions <- c(#"Vocational.Training" = "vocational training$",
  "External.Placement" = "outside of our company$",
  "Internal.Placement" = "inside our organisation$",
  "None.of.the.Above" = "^Ninguno")


dt <- data.table(prac)

My idea is to use a vector with conditions whose names are the variable for which it is supposed to apply.我的想法是使用一个带有条件的向量,其名称是它应该应用的变量。 Then a lapply that updates your dataframe by reference (here you lose some programming efficiency but you will have very good computing efficiency)然后是一个通过引用更新你的数据帧的lapply (这里你失去了一些编程效率,但你会有很好的计算效率)

dt <- lapply(seq_len(length(list_conditions)), function(i) {

  var <- names(list_conditions)[i]
  cond <- list_conditions[i]
  val <- gsub("\\.","", var)
  dt[, 'tempcol' := NA_character_]
  dt[grepl(as.character(cond), get(var)), tempcol := as.character(val)]
  dt[,c(var) := tempcol]

})
dt <- dt[[length(dt)]]
dt[,'tempcol' := NULL]

The line dt <- dt[[length(dt)]] is here because R returns a list but we are only interested in its last element (last update for the dataframe).dt <- dt[[length(dt)]]在这里是因为 R 返回一个列表,但我们只对它的最后一个元素(数据帧的最后更新)感兴趣。 You can generalize this program if you prefer creating new columns rather than rewriting existing ones.如果您更喜欢创建新列而不是重写现有列,则可以推广此程序。

The output is:输出是:

dt
    External.Placement Internal.Placement None.of.the.Above
 1:               <NA>               <NA> None of the Above
 2:               <NA>               <NA> None of the Above
 3: External Placement               <NA>              <NA>
 4:               <NA>               <NA>              <NA>
 5:               <NA>               <NA>              <NA>
 6: External Placement               <NA>              <NA>
 7: External Placement               <NA>              <NA>
 8: External Placement               <NA>              <NA>
 9:               <NA>               <NA>              <NA>
10: External Placement               <NA>              <NA>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据从不同列获得的值创建新列,使用 R 中的 mutate() 和 case_when 函数 - Creating a new column based on values obtained from different column, using mutate() and case_when function in R 使用 case_when 替换 r 中的值 - Replace values in r using case_when #R 如何根据从向量中获取的列名来改变 case_when - #R how to mutate case_when based on column name taken from vector 变异,case_when,粘贴到 R - Mutate, case_when, paste in R 使用 case_when() 和 filter() 根据 R 中一列中的值和另一列中的级别对数​​据框进行子集化 - using case_when() and filter() to subset a dataframe based on values in one column and levels in another column in R 使用 R 中其他列的 case_when 添加新列 - Add new column using case_when of other column in R 使用字符串列表 R 替换所有列值的有效方法 - Efficient way to replace all column values using list of strings R 使用mutate和case_when时,从现有列中插入值 - Insert values from an existing column when using mutate and case_when 在 R 中使用 mutate 和 case_when() 语句用 unite() 填充列,整洁的诗句 - Fill column with unite() using mutate and case_when() statement in R, tidy verse 在 R 中改变 case_when 以创建每个参与者的时间段列 - Mutate case_when in R to create a column of time periods per participant
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM