I trying to detect all occurrences of a certain string, that is not surrounded by certain strings (using regex lookarounds). Eg. all occurrences of "African" but not "South African Society". See a simplified example below.
#My example text:
text <- c("South African Society", "South African",
"African Society", "South African Society and African Society")
#My code examples:
str_detect(text, "(?<!South )African(?! Society)")
#or
grepl("(?<!South )African(?! Society)", perl=TRUE , text)
#I need:
[1] FALSE TRUE TRUE TRUE
#instead of:
[1] FALSE FALSE FALSE FALSE
The problem seems to be that regex evaluates the lookbehind and the lookahead separately and not as a whole. It should require both conditions and not only one.
The (?<!South )African(?! Society)
pattern matches African
when it is not preceded with neither South
nor Society
. If there is South
or Society
there will be no match.
There are several solutions.
African(?<!South African(?= Society))
See the regex demo . Here, African
is only matched when the regex engine does not find South African
at the position after matching African
substring that is immediately followed with space and Society
. Using this check after African
is more efficient in case there are longer strings that do not match the pattern than moving it before the word African
(see the (?<!South (?=African Society))African
regex demo ).
Alternatively, you may use a SKIP-FAIL technique :
South African Society(*SKIP)(*F)|African
See another regex demo . Here, South African Society
is matched first, and (*SKIP)(*F)
makes this match fail and proceed to the next match, so African
is matched in all contexts other than South African Society
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.