Regular expression (regex lookarounds) to detected a certain string not between certain strings (lookahead & lookbehind, word not surrounded by words)

Question

I trying to detect all occurrences of a certain string, that is not surrounded by certain strings (using regex lookarounds). Eg. all occurrences of "African" but not "South African Society". See a simplified example below.

#My example text:
text <- c("South African Society", "South African", 
"African Society", "South African Society and African Society")

#My code examples:
str_detect(text, "(?<!South )African(?! Society)")
#or
grepl("(?<!South )African(?! Society)",  perl=TRUE , text)

#I need:
[1] FALSE TRUE TRUE TRUE 

#instead of:
[1] FALSE FALSE FALSE FALSE

The problem seems to be that regex evaluates the lookbehind and the lookahead separately and not as a whole. It should require both conditions and not only one.

Answer 1

The (?<!South )African(?! Society) pattern matches African when it is not preceded with neither South nor Society . If there is South or Society there will be no match.

There are several solutions.

 African(?<!South African(?= Society))

See the regex demo . Here, African is only matched when the regex engine does not find South African at the position after matching African substring that is immediately followed with space and Society . Using this check after African is more efficient in case there are longer strings that do not match the pattern than moving it before the word African (see the (?<!South (?=African Society))African regex demo ).

Alternatively, you may use a SKIP-FAIL technique :

South African Society(*SKIP)(*F)|African

See another regex demo . Here, South African Society is matched first, and (*SKIP)(*F) makes this match fail and proceed to the next match, so African is matched in all contexts other than South African Society .

Regular expression (regex lookarounds) to detected a certain string not between certain strings (lookahead & lookbehind, word not surrounded by words)

Question

1 answers

solution1
3 ACCPTED 2018-11-28 18:30:02

Regular expression (regex lookarounds) to detected a certain string not between certain strings (lookahead & lookbehind, word not surrounded by words)

Question

1 answers

solution1 3 ACCPTED 2018-11-28 18:30:02

solution1
3 ACCPTED 2018-11-28 18:30:02