How to create regex for inline ordered list?

Question

I have a form field, that most contain only inline ordered list:

1. This item may be contain characters, symbols or numbers. 2. And this item also...

The following code not working for user input validation (users can input only inline ordered list):

definiton_re = re.compile(r'^(?:\d\.\s(?:.+?))+$')
validate_definiton = RegexValidator(definiton_re, _("Enter a valid 'definition' in format: 1. meaning #1, 2. meaning #2...etc"), 'invalid')

PS: Here i'm using RegexValidator class from Django framework to validate form field value.

Answer 1

Here is my solution. It's working not bad.

input = '1. List item #1, 2. List item 2, 3. List item #3.'
regex = re.compile(r'(?:^|\s)(?:\d{1,2}\.\s)(.+?)(?=(?:, \d{1,2}\.)|$)')
# Parsing.
regex.findall(input) # Result: ['List item #1', 'List item 2', 'List item #3.']
# Validation.
validate_input = RegexValidator(regex, _("Input must be in format: 1. any thing..., 2. any thing...etc"), 'invalid')
validate_input(input) # No errors.

Answer 2

Nice solution from OP. To push it further, let's do some regex optimization / golfing.

(?<!\S)\d{1,2}\.\s((?:(?!,\s\d{1,2}\.),?[^,]*)+)

Here's what's new:

(?:^|\\s) Matches with backtracking between the alternation. Here we use (?<!\\S) instead, to assert that we're not in front of a non-whitespace character.
\\d{1,2}\\.\\s doesn't have to be within a non-capturing group.
(.+?)(?=(?:, \\d{1,2}\\.)|$) is too bulky. We change this bit to:
- ( Capturing group
- (?:
- (?! Negative lookahead: Assert that our position is NOT :
- ,\\s\\d{1,2}\\. A comma, whitespace character, then a list index.
- )
- ,?[^,]* Here's the interesting optimization:
- - We match a comma if there is one. Because we knew from our lookahead assertion that this position does not start a new list index. Therefore, we can safely assume that the remaining bit of the non-comma sequences (if there are any) are not related to the next element, hence we roll over them with the * quantifier, and there's no backtracking.
- - This is a significant improvement over (.+?) .
- )+ Keep repeating the group until the negative lookahead assertion fails.
- )

You can use that in place of the regex in the other answer , and here's a regex demo !

Though, at first glance, this problem is better solved with re.split() while parsing:

input = '1. List item #1, 2. List item 2, 3. List item #3.';
lines = re.split('(?:^|, )\d{1,2}\. ', input);
 # Gives ['', 'List item #1', 'List item 2', 'List item #3.']
if lines[0] == '':
  lines = lines[1:];
 # Throws away the first empty element from splitting.
print lines;

Here is an online code demo .

Unfortunately, for the validation you would have to follow the regex matching approach, just compile the regex upstairs:

regex = re.compile(r'(?<!\S)\d{1,2}\.\s((?:(?!,\s\d{1,2}\.),?[^,]*)+)')

How to create regex for inline ordered list?

Question

2 answers

solution1
0 2014-08-17 17:05:22

solution2
0 ACCPTED 2014-08-17 18:11:15

How to create regex for inline ordered list?

Question

2 answers

solution1 0 2014-08-17 17:05:22

solution2 0 ACCPTED 2014-08-17 18:11:15

solution1
0 2014-08-17 17:05:22

solution2
0 ACCPTED 2014-08-17 18:11:15