Regex : split by occurrences groups

Question

I am trying to find a solution to split a string by occurrences in groups.

Strings are formatted like this: "AAA/BBB/CCC/DDD/BBB/CCC/DDD/BBB/DDD"

I want the string to split like this:

1 ) AAA/BBB/CCC/DDD

2 ) BBB/CCC/DDD

3 ) BBB/DDD

'/' is always the separator and words are always AAA, BBB, CCC and DDD.

I tried regex expression (AAA|BBB|CCC|DDD){x} with {x} to specify the number of occurrences but it seems {} works only for characters, not words.

Answer 1

You can use re.findall with the following positive lookahead patterns to ensure that slashes are included only if they are followed by characters that are allowed in the sequence, and use ? as a repeater to make a match of each word optional (but greedy):

import re
s = 'AAA/BBB/CCC/DDD/BBB/CCC/DDD/BBB/DDD'
re.findall('(?=[ABCD])(?:AAA(?:/(?=[BCD]))?)?(?:BBB(?:/(?=[CD]))?)?(?:CCC(?:/(?=D))?)?(?:DDD)?', s)

This returns:

['AAA/BBB/CCC/DDD', 'BBB/CCC/DDD', 'BBB/DDD']

Answer 2

You can use re.split with an alternation pattern that includes slashes that are surrounded by positive lookbehind and lookahead patterns to ensure that the character preceding the slash is to be latter in the sequence than the character following the slash:

import re
s = 'AAA/BBB/CCC/DDD/BBB/CCC/DDD/BBB/DDD'
re.split('(?:(?<=[BCD])/(?=A)|(?<=[CD])/(?=B)|(?<=D)/(?=C))', s)

This returns:

['AAA/BBB/CCC/DDD', 'BBB/CCC/DDD', 'BBB/DDD']

Regex : split by occurrences groups

Question

2 answers

solution1
1 2019-03-28 20:25:45

solution2
1 2019-03-28 20:33:06

Regex : split by occurrences groups

Question

2 answers

solution1 1 2019-03-28 20:25:45

solution2 1 2019-03-28 20:33:06

solution1
1 2019-03-28 20:25:45

solution2
1 2019-03-28 20:33:06