简体   繁体   中英

Regular expression (regex lookarounds) to detected a certain string not between certain strings (lookahead & lookbehind, word not surrounded by words)

I trying to detect all occurrences of a certain string, that is not surrounded by certain strings (using regex lookarounds). Eg. all occurrences of "African" but not "South African Society". See a simplified example below.

#My example text:
text <- c("South African Society", "South African", 
"African Society", "South African Society and African Society")

#My code examples:
str_detect(text, "(?<!South )African(?! Society)")
#or
grepl("(?<!South )African(?! Society)",  perl=TRUE , text)

#I need:
[1] FALSE TRUE TRUE TRUE 

#instead of:
[1] FALSE FALSE FALSE FALSE

The problem seems to be that regex evaluates the lookbehind and the lookahead separately and not as a whole. It should require both conditions and not only one.

The (?<!South )African(?! Society) pattern matches African when it is not preceded with neither South nor Society . If there is South or Society there will be no match.

There are several solutions.

 African(?<!South African(?= Society))

See the regex demo . Here, African is only matched when the regex engine does not find South African at the position after matching African substring that is immediately followed with space and Society . Using this check after African is more efficient in case there are longer strings that do not match the pattern than moving it before the word African (see the (?<!South (?=African Society))African regex demo ).

Alternatively, you may use a SKIP-FAIL technique :

South African Society(*SKIP)(*F)|African

See another regex demo . Here, South African Society is matched first, and (*SKIP)(*F) makes this match fail and proceed to the next match, so African is matched in all contexts other than South African Society .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM