简体   繁体   中英

String Matching in R like PRXMATCH() in SAS

Seeking your help to understand how to match strings in R , like PRXMATCH() does in SAS .

List1 <-c("lead","good")
List2 <-c("Quality","understand")
Name  <-c("grp1","grp2")

I have a dataframe containing column sentence . For every sentence I need to:

  • Look for words in List1
  • If word is found then corresponding word of List2 is looked for.
  • If word is found at +-5 words distance from the word from List1 , then the name from Name should be added to the result column.

For example, "lead" is searched for in all sentences. When "lead" is found, then within that sentence, if "Quality" is found at +-5 words distance then "grp1" should be added in the result column, else discarded.

Something like this, maybe?

myData <- data.frame(sentence = c("The quality bla bla bla lead bla", 
                                  "The quality bla bla bla bla bla lead bla",
                                  "The lead quality bla bla",
                                  "The lead bla bla quality",
                                  "The lead bla bla bla bla bla quality of",
                                  "It allows us to understand how good bla",
                                  "It is good to understand that bla",
                                  "It is also good bla bla bla if we understand",
                                  "lead quality is good to understand"),
                     Result = "",
                     stringsAsFactors = FALSE)

List1 <-c("lead","good")
List2 <-c("quality","understand")
Name  <-c("grp1","grp2")

regexpr <- paste0("(\\b",List1,"\\s+(\\w+\\s+){0,4}",List2,"\\b)|(\\b",List2,"\\s+(\\w+\\s+){0,4}",List1,"\\b)")


for(i in 1:length(regexpr)) {
  myData$Result <- ifelse(grepl(pattern = regexpr[i], x = myData$sentence), 
                          yes = paste(myData$Result, Name[i]), 
                          no = myData$Result)
}

Results

> myData
                                      sentence     Result
1             The quality bla bla bla lead bla       grp1
2     The quality bla bla bla bla bla lead bla           
3                     The lead quality bla bla       grp1
4                     The lead bla bla quality       grp1
5      The lead bla bla bla bla bla quality of           
6      It allows us to understand how good bla       grp2
7            It is good to understand that bla       grp2
8 It is also good bla bla bla if we understand           
9           lead quality is good to understand  grp1 grp2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM