简体   繁体   中英

Regular expression negation in R

I have a problem trying to find a way to implement negation in R regular expressions.

my_strings <- c("a non-rheumatic fever", "a nonrheumatic fever", "a rheumatic fever", "a not rheumatic fever")
my_strings
## [1] "a non-rheumatic fever" "a nonrheumatic fever" "a rheumatic fever" "a not rheumatic fever"

In the above string, I'm trying to find a regular expression that will output just the following:

## [1] "a rheumatic fever"

I've tried the following but I can't figure out how to negate the presence of "no(n|t)(\\\\s+|-)?" immediately preceding "rheumatic" :

t_inc <- "\\b([^n][^o][^nt](\\s+|-)?(rheumatic))\\b"
grep(t_inc, my_strings, ignore.case = T, perl = T, value = T)
## character(0)

t_inc <- "\\b([^(no(n|t))](\\s+|-)?(rheumatic))\\b"
grep(t_inc, my_strings, ignore.case = T, perl = T, value = T)
## character(0)

Please could someone give me some pointers?

May be we can modify the syntax to a bit simpler one by making ue of invert as mentioned by @IceCreamToucan in the comments

grep("no[nt][- ]?rheumatic", my_strings, invert = TRUE, value = TRUE)
#[1] "a rheumatic fever"

the pattern matches 'no', followed by either letter 'n' or t', followed by a - or space if present and the word 'rheumatic'. With invert= TRUE , it will return all those matches that are not matching with the pattern

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM