簡體   English   中英

R:基於字符串替換列值的有效方法(可能使用 case_when 或某種其他形式的 mutate)?

[英]R: Efficient way to replace column values based on strings (maybe with case_when or some other form of mutate)?

是否有更有效的方法來根據在列中搜索字符串來替換列中的值?

prac <- data.frame(`External Placement` = c(NA, NA, "Spanish words We place outside of our company", NA, NA, "Spanish words We place outside of our company", "Spanish words We place outside of our company", "Spanish words We place outside of our company", NA, "Spanish words We place outside of our company"),
             `Internal Placement` = c(NA, NA, "Spanish words we place inside of our organisation", NA, "Spanish words we place inside of our organisation", "Spanish words we place inside of our organisation", NA, NA, NA, NA),
             `None of the above` = c("Ninguno none", "Ninguno none", NA, NA, NA, NA, NA, NA, NA, NA))

這是樣本數據。 我使用了一系列 if/else 來獲取我想要的值,但我嘗試了很長時間來找出一種方法來使用 mutate、case_when 等來減少重復。 有沒有更好的辦法? 這是我到目前為止所做的:

prac$`Vocational Training` <- ifelse(grepl("vocational training$", prac$`Vocational Training`),
                                 "Vocational training", NA)
prac$`External Placement` <- ifelse(grepl("outside of our company$", prac$`External Placement`),
                                    "External placement", NA)
prac$`Internal Placement` <- ifelse(grepl("inside our organisation$", prac$`Internal Placement`),
                                    "Internal placement", NA)  
prac$`None of the Above` <- ifelse(grepl("^Ninguno", prac$`None of the Above`), "None or other", NA)

也許你可以試試下面的代碼

v <- c("outside of our company$","inside our organisation$","^Ninguno")
r <- c("External placement","Internal placement","None or other")

prac[]<- mapply(function(x,y) ifelse(x,y,NA),
                data.frame(mapply(grepl, v,prac)),
                r)

以至於

> prac
   External.Placement Internal.Placement None.of.the.above
1                <NA>               <NA>     None or other
2                <NA>               <NA>     None or other
3  External placement               <NA>              <NA>
4                <NA>               <NA>              <NA>
5                <NA>               <NA>              <NA>
6  External placement               <NA>              <NA>
7  External placement               <NA>              <NA>
8  External placement               <NA>              <NA>
9                <NA>               <NA>              <NA>
10 External placement               <NA>              <NA>

數據

prac <- structure(list(External.Placement = structure(c(NA, NA, 1L, NA, 
NA, 1L, 1L, 1L, NA, 1L), .Label = "Spanish words We place outside of our company", class = "factor"), 
    Internal.Placement = structure(c(NA, NA, 1L, NA, 1L, 1L, 
    NA, NA, NA, NA), .Label = "Spanish words we place inside of our organisation", class = "factor"), 
    None.of.the.above = structure(c(1L, 1L, NA, NA, NA, NA, NA, 
    NA, NA, NA), .Label = "Ninguno none", class = "factor")), class = "data.frame", row.names = c(NA, 
-10L))

這取決於你想要的效率是計算效率(計算時間)還是編程效率(使用的行數)

在編程效率方面,很難擊敗data.table解決方案。 我可以向你推薦這個:

prac <- data.frame(`External Placement` = c(NA, NA, "Spanish words We place outside of our company", NA, NA, "Spanish words We place outside of our company", "Spanish words We place outside of our company", "Spanish words We place outside of our company", NA, "Spanish words We place outside of our company"),
                   `Internal Placement` = c(NA, NA, "Spanish words we place inside of our organisation", NA, "Spanish words we place inside of our organisation", "Spanish words we place inside of our organisation", NA, NA, NA, NA),
                   `None of the Above` = c("Ninguno none", "Ninguno none", NA, NA, NA, NA, NA, NA, NA, NA))

library(data.table)

list_conditions <- c(#"Vocational.Training" = "vocational training$",
  "External.Placement" = "outside of our company$",
  "Internal.Placement" = "inside our organisation$",
  "None.of.the.Above" = "^Ninguno")


dt <- data.table(prac)

我的想法是使用一個帶有條件的向量,其名稱是它應該應用的變量。 然后是一個通過引用更新你的數據幀的lapply (這里你失去了一些編程效率,但你會有很好的計算效率)

dt <- lapply(seq_len(length(list_conditions)), function(i) {

  var <- names(list_conditions)[i]
  cond <- list_conditions[i]
  val <- gsub("\\.","", var)
  dt[, 'tempcol' := NA_character_]
  dt[grepl(as.character(cond), get(var)), tempcol := as.character(val)]
  dt[,c(var) := tempcol]

})
dt <- dt[[length(dt)]]
dt[,'tempcol' := NULL]

dt <- dt[[length(dt)]]在這里是因為 R 返回一個列表,但我們只對它的最后一個元素(數據幀的最后更新)感興趣。 如果您更喜歡創建新列而不是重寫現有列,則可以推廣此程序。

輸出是:

dt
    External.Placement Internal.Placement None.of.the.Above
 1:               <NA>               <NA> None of the Above
 2:               <NA>               <NA> None of the Above
 3: External Placement               <NA>              <NA>
 4:               <NA>               <NA>              <NA>
 5:               <NA>               <NA>              <NA>
 6: External Placement               <NA>              <NA>
 7: External Placement               <NA>              <NA>
 8: External Placement               <NA>              <NA>
 9:               <NA>               <NA>              <NA>
10: External Placement               <NA>              <NA>

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM