從 dataframe 中刪除英文和非英文名稱

Question

我正在處理數百行垃圾數據。 一個虛擬數據如下：

   foo_data <- c("Mary Smith is not here", "Wiremu Karen is not a nice person",
                  "Rawiri Herewini is my name", "Ajibade Smith is my man", NA)

我需要刪除所有名字（英文和非英文名字和姓氏，這樣我想要的 output 將是：

[1] "is not here"         " is not a nice person" " is my name"  
[4] "is my man"           NA

但是，使用 textclean package，我只能刪除英文名稱，留下非英文名稱：

library(textclean)
textclean::replace_names(foo_data)

[1] "  is not here"     "Wiremu  is not a nice person"    "Rawiri Herewini is my name"  
[4] "Ajibade  is my man"           NA

任何幫助將不勝感激。

Answer 1

你可以這樣做：

s <- textclean::replace_names(foo_data)
trimws(gsub(sprintf('\\b(%s)\\b', 
      paste0(unlist(hunspell::hunspell(s)), collapse = '|')), '', s))

[1] "is not here"          "is not a nice person" "is my name"           "is my man"            NA

從 dataframe 中刪除英文和非英文名稱

問題描述

1 個解決方案

解決方案1
2 2021-06-10 06:35:26

從 dataframe 中刪除英文和非英文名稱

問題描述

1 個解決方案

解決方案1 2 2021-06-10 06:35:26

解決方案1
2 2021-06-10 06:35:26