Seeking your help to understand how to match strings in R
, like PRXMATCH()
does in SAS
.
List1 <-c("lead","good")
List2 <-c("Quality","understand")
Name <-c("grp1","grp2")
I have a dataframe containing column sentence
. For every sentence I need to:
List1
List2
is looked for. List1
, then the name from Name
should be added to the result
column. For example, "lead"
is searched for in all sentences. When "lead"
is found, then within that sentence, if "Quality"
is found at +-5 words distance then "grp1"
should be added in the result
column, else discarded.
Something like this, maybe?
myData <- data.frame(sentence = c("The quality bla bla bla lead bla",
"The quality bla bla bla bla bla lead bla",
"The lead quality bla bla",
"The lead bla bla quality",
"The lead bla bla bla bla bla quality of",
"It allows us to understand how good bla",
"It is good to understand that bla",
"It is also good bla bla bla if we understand",
"lead quality is good to understand"),
Result = "",
stringsAsFactors = FALSE)
List1 <-c("lead","good")
List2 <-c("quality","understand")
Name <-c("grp1","grp2")
regexpr <- paste0("(\\b",List1,"\\s+(\\w+\\s+){0,4}",List2,"\\b)|(\\b",List2,"\\s+(\\w+\\s+){0,4}",List1,"\\b)")
for(i in 1:length(regexpr)) {
myData$Result <- ifelse(grepl(pattern = regexpr[i], x = myData$sentence),
yes = paste(myData$Result, Name[i]),
no = myData$Result)
}
> myData
sentence Result
1 The quality bla bla bla lead bla grp1
2 The quality bla bla bla bla bla lead bla
3 The lead quality bla bla grp1
4 The lead bla bla quality grp1
5 The lead bla bla bla bla bla quality of
6 It allows us to understand how good bla grp2
7 It is good to understand that bla grp2
8 It is also good bla bla bla if we understand
9 lead quality is good to understand grp1 grp2
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.