[英]Remove both English and Non-English names from a dataframe
我正在處理數百行垃圾數據。 一個虛擬數據如下:
foo_data <- c("Mary Smith is not here", "Wiremu Karen is not a nice person",
"Rawiri Herewini is my name", "Ajibade Smith is my man", NA)
我需要刪除所有名字(英文和非英文名字和姓氏,這樣我想要的 output 將是:
[1] "is not here" " is not a nice person" " is my name"
[4] "is my man" NA
但是,使用 textclean package,我只能刪除英文名稱,留下非英文名稱:
library(textclean)
textclean::replace_names(foo_data)
[1] " is not here" "Wiremu is not a nice person" "Rawiri Herewini is my name"
[4] "Ajibade is my man" NA
任何幫助將不勝感激。
你可以這樣做:
s <- textclean::replace_names(foo_data)
trimws(gsub(sprintf('\\b(%s)\\b',
paste0(unlist(hunspell::hunspell(s)), collapse = '|')), '', s))
[1] "is not here" "is not a nice person" "is my name" "is my man" NA
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.