在 R 中跨字符向量查找匹配項

Question

鑒於以下兩個向量，有沒有辦法生成所需的數據幀？ 這代表了一個現實世界的情況，我必須數據幀第一個包含一個帶有數據庫值（鍵）的列，第二個包含一個包含 1000+ 行的列，每個文件名（可能）我需要匹配。 問題是可以有多個文件（可能）與任何給定的鍵匹配。 我曾使用過 grep、合並、內部連接等，但無法將它們合並到一個解決方案中。 任何建議表示贊賞！

potentials <- c("tigerINTHENIGHT",
            "tigerWALKINGALONE",
            "bearOHMY",
            "bearWITHME",
            "rat",
            "imatchnothing")
keys <- c("tiger",
            "bear",
            "rat")


desired <- data.frame(keys, c("tigerINTHENIGHT, tigerWALKINGALONE", "bearOHMY, bearWITHME", "rat"))
names(desired) <- c("key", "matches")

我認為的解決方案的偽代碼：

#new column which is comma separated potentials
# x being the substring length i.e. x = 4 means true if first 4 letters match
function createNewColumn(keys, potentials, x){
  str result = na
  foreach(key in keys){
    if(substring(key, 0, x) == any(substring(potentals, 0 ,x))){ //search entire potential vector
      result += potential that matched + ', '
    }
  }
  return new column with result as the value on the current row
}

Answer 1

我們可以編寫一個小函數來提取匹配項，然后遍歷鍵：

return_matches <- function(keys, potentials, fixed = TRUE) {
  vapply(keys, function(k) {
    paste(grep(k, potentials, value = TRUE, fixed = fixed), collapse = ", ")
  }, FUN.VALUE = character(1))
}

vapply只是sapply的類型安全版本，這意味着它只會返回字符向量。 當您設置fixed = TRUE時，function 將運行得更快，但不再識別正則表達式。 然后我們可以輕松制作所需的data.frame ：

df <- data.frame(
  key = keys,
  matches = return_matches(keys, potentials),
  stringsAsFactors = FALSE
)
df
#>         key                            matches
#> tiger tiger tigerINTHENIGHT, tigerWALKINGALONE
#> bear   bear               bearOHMY, bearWITHME
#> rat     rat                                rat

將循環放在 function 中而不是直接運行它的原因只是為了使代碼看起來更干凈。

Answer 2

您可以使用grep進行交互

 > Match <- sapply(keys, function(item) {
                  paste0(grep(item, potentials, value = TRUE), collapse = ", ")
     } )     

> data.frame(keys, Match, row.names = NULL)
       keys                              Match
    1 tiger tigerINTHENIGHT, tigerWALKINGALONE
    2  bear               bearOHMY, bearWITHME
    3   rat                                rat

在 R 中跨字符向量查找匹配項

問題描述

2 個解決方案

解決方案1
1 已采納 2019-10-30 15:08:46

解決方案2
1 2019-10-30 15:11:22

在 R 中跨字符向量查找匹配項

問題描述

2 個解決方案

解決方案1 1 已采納 2019-10-30 15:08:46

解決方案2 1 2019-10-30 15:11:22

解決方案1
1 已采納 2019-10-30 15:08:46

解決方案2
1 2019-10-30 15:11:22