简体   繁体   中英

Split string on portion of matched regular expression (python)

Say I have a string 'ad>ad>ad>>ad' and I want to split on this on the '>' (not the '>>' chars). Just picked up regex and was wondering if there is a way (special character) to split on a specific part of the matched expression, rather than splitting on the whole matched expression, for example the regex could be:

re.split('[^>]>[^>]', 'ad>ad>ad>>ad')

Can you get it to split on the char in parenthesis [^>](>)[^>] ?

You need to use lookarounds:

re.split(r'(?<!>)>(?!>)', 'ad>ad>ad>>ad')

See the regex demo

The (?<!>)>(?!>) pattern only matches a > that is not preceded with a < (due to the negative lookbehind (?<!>) ) and that is not followed with a < (due to the negative lookahead (?!>) ).

Since lookarounds do not consume the characters (unlike negated (and positive) character classes, like [^>] ), we only match and split on a < symbol without "touching" the symbols around it.

Try with \\b>\\b

This will check for single > surrounded by non-whitespace characters. As the string in the question is continuous stream of characters checking word boundary with \\b is simplest method.

Regex101 Demo

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM