字符串替換：如何處理相似的字符串和空格

Question

上下文：使用包含相應替換的表將表從法語翻譯成英語。

問題：字符串有時非常相似，當涉及空格時str_replace()不會考慮整個字符串。

可復制的例子：

library(stringr)  #needed for the str_replace_all() function

#datasets

# test is the table indicating corresponding strings
test = data.frame(fr = as.character(c("Autre", "Autres", "Autre encore")),
                  en = as.character(c("Other", "Others", "Other again")),
                  stringsAsFactors = FALSE)
# test1 is the table I want to translate
test1 = data.frame(totrans = as.character(c("Autre", "Autres", "Autre encore")),
                   stringsAsFactors = FALSE)

# here is a function to translate
test2 = str_replace_all(test1$totrans, setNames(test$en, test$fr))

Output：

我明白了

> test2
[1] "Other"        "Others"       "Other encore"

預期結果：

> testexpected
[1] "Other"       "Others"      "Other again"

如您所見，如果字符串以相同的開頭但沒有空格，則替換是成功的（請參閱其他和其他）但是當有空格時，它會失敗（“Autre encore”被“Other encore”而不是“又是另一個”）。

我覺得答案很明顯，但我不知道如何解決它......歡迎任何建議。

Answer 1

我認為您只需要查找周圍的單詞邊界（即“\\ b”）。 在str_replace_all中通過paste0調用添加這些內容很簡單。

請注意，您不需要為此包含整個 tidyverse； str_replace_all function 是 stringr package 的一部分，它只是調用library(tidyverse)時加載的幾個包之一：

library(stringr) 

test = data.frame(fr = as.character(c("Autre", "Autres", "Autre encore")),
                  en = as.character(c("Other", "Others", "Other again")),
                  stringsAsFactors = FALSE)

test1 = data.frame(totrans = as.character(c("Autre", "Autres", "Autre encore")),
                   stringsAsFactors = FALSE)

str_replace_all(test1$totrans, paste0("\\b", test$fr, "\\b"), test$en)
#> [1] "Other"       "Others"      "Other again"

^{由代表 package (v0.3.0) 於 2020 年 5 月 14 日創建}

字符串替換：如何處理相似的字符串和空格

問題描述

1 個解決方案

解決方案1
2 已采納 2020-05-14 12:58:12

字符串替換：如何處理相似的字符串和空格

問題描述

1 個解決方案

解決方案1 2 已采納 2020-05-14 12:58:12

解決方案1
2 已采納 2020-05-14 12:58:12