简体   繁体   中英

agrep string matching in R

I have two list of some product names. My problem is "Operating system" is matching with "system", "cooling system",etc. But it has to match only with "Operating","OS". Another example is "Key Board" should be matched with "key" or "KB" but not with "Mother Board" or just "Board".

How to give importance to first word than second word?

I used agrep() in R. It matches "system" and "cooling system" also for first example. How to avoid that matches?

And is there any function/method to match "key board" with "KB" and "operating system" with "OS"?

Thanks in advance.

I have written a function for this, not the most optimized way to do it but this will do the task. the inputs are vectors not lists, hope this helps

stringMatch<-function(search.string,inputstring,pattern=" "){
stringsplit<-unlist(str_split(search.string,pattern))

firstletter<-c()
for(i in seq(1,length(stringsplit))){firstletter<-paste(firstletter,
substring(stringsplit[i],1,1),sep="")}
search.string.l<-tolower(search.string)
firstletter.l<-tolower(firstletter)

matchstring<-grep(paste("\\b",search.string.l,"\\b","|","\\b",firstletter.l,"\\b"
,sep=""),tolower(inputstring))
return(matchstring)
}

test1<-c('hello p','helbbo','hello test','HP')
search.string<-'HP'
[1] 4

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM