简体   繁体   中英

Regex : split by occurrences groups

I am trying to find a solution to split a string by occurrences in groups.

Strings are formatted like this: "AAA/BBB/CCC/DDD/BBB/CCC/DDD/BBB/DDD"

I want the string to split like this:

1 ) AAA/BBB/CCC/DDD

2 ) BBB/CCC/DDD

3 ) BBB/DDD

'/' is always the separator and words are always AAA, BBB, CCC and DDD.

I tried regex expression (AAA|BBB|CCC|DDD){x} with {x} to specify the number of occurrences but it seems {} works only for characters, not words.

You can use re.findall with the following positive lookahead patterns to ensure that slashes are included only if they are followed by characters that are allowed in the sequence, and use ? as a repeater to make a match of each word optional (but greedy):

import re
s = 'AAA/BBB/CCC/DDD/BBB/CCC/DDD/BBB/DDD'
re.findall('(?=[ABCD])(?:AAA(?:/(?=[BCD]))?)?(?:BBB(?:/(?=[CD]))?)?(?:CCC(?:/(?=D))?)?(?:DDD)?', s)

This returns:

['AAA/BBB/CCC/DDD', 'BBB/CCC/DDD', 'BBB/DDD']

You can use re.split with an alternation pattern that includes slashes that are surrounded by positive lookbehind and lookahead patterns to ensure that the character preceding the slash is to be latter in the sequence than the character following the slash:

import re
s = 'AAA/BBB/CCC/DDD/BBB/CCC/DDD/BBB/DDD'
re.split('(?:(?<=[BCD])/(?=A)|(?<=[CD])/(?=B)|(?<=D)/(?=C))', s)

This returns:

['AAA/BBB/CCC/DDD', 'BBB/CCC/DDD', 'BBB/DDD']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM