[英]Matching the word pattern and replace the word by NA in R
I've 2 dataframes in R - one list of names and other one is a word dictionary. 我在R中有2个数据框-一个名称列表,另一个是单词词典。 If any part a name is part of word dictionary then replace by NA else return the name
如果名称的任何部分是单词词典的一部分,则用NA代替,否则返回名称
Names - Dataframe 名称-数据框
Name
Louis
Messi
duplessis
Jegan
Praveen
word dictionary - Dataframe 单词词典-数据框
Dictionary
vee
sis
Expected Output 预期产量
Name Processed_Name
Louis Louis
Messi Messi
duplessis NA
Jegan Jegan
Praveen NA
library(data.table) # needed library
# create data
dt <- data.table("Name"=c("Louis",
"Messi",
"duplessis",
"Jegan",
"Praveen"))
dict <- c("vee","sis")
# make a combined vector of the words in the dictionary
dict_2 <- paste0(dict,collapse = "|")
# desired output
dt[,processed_Name:=ifelse(Name%like%dict_2,NA,Name)]
OUTPUT OUTPUT
Name processed_Name
1: Louis Louis
2: Messi Messi
3: duplessis NA
4: Jegan Jegan
5: Praveen NA
UPDATE based on OP's comment 根据OP的评论进行更新
# changed the input a bit, so that it contains the numbers
# that i am going to generate for the dictionary.
dt <- data.table("Name"=c("Loui1s",
"Messi",
"duple2ssis",
"Jegan",
"Praveen"))
dict_all <- as.character(c(1:5000)) # i generate numbers so that they all are different
dict_split <- split(dict_all, ceiling(seq_along(dict_all)/1000))
dict_split_2 <- lapply(dict_split, function(x){paste0(x, collapse = "|")})
dt[,processed_Name_2:=ifelse(Name%like%dict_split_2[[1]] | Name%like%dict_split_2[[2]] |
Name%like%dict_split_2[[3]] | Name%like%dict_split_2[[4]] |
Name%like%dict_split_2[[5]],NA,Name)]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.