简体   繁体   中英

Regex to get character not between two other characters

How would one use Regex to get a character/word that is not between two other characters/words?

For example, in:

hello world [hello hello] world hello [world hello world hello] world hello [hello] hello

It would select:

hello world [hello hello] world hello [world hello world hello] world hello [hello] hello

This question gets text, not between two characters ( (?<=^|\\])[^[]+ ), which is close, all one needs to do on top of that is select specific words from that.

You can take the opposite approach by selecting what you don't want, which is from an opening till closing square bracket. Then use an alternation using | and capture what you want to keep.

Using for exampole re.findall you get the values of the capturing groups, and then you can filter out the empty strings.

\[[^][]*]|\b(hello)\b

Regex demo | Python demo

Example code

import re
 
regex = r"\[[^][]*]|\b(hello)\b"
 
test_str = ("hello world [hello hello] world hello [world hello world hello] world hello [hello] hello")
 
print(list(filter(None, re.findall(regex, test_str))))

Output

['hello', 'hello', 'hello', 'hello']

Use PyPi regex:

import regex
text='hello world [hello hello] world hello [world hello world hello] world hello [hello] hello'
print( regex.sub(r'\[[^][]*](*SKIP)(?!)|\b(hello)\b', r'++\1++', text) )

Code demo

Output:

++hello++ world [hello hello] world ++hello++ [world hello world hello] world ++hello++ 
[hello] ++hello++

\\[[^][]*](*SKIP)(?!)|\\b(hello)\\ expression matches strings between square brackets and these matches are dropped, hello is matched within word boundaries and replaced eventually with regex.sub .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM