Regex matching end of list element

My aim is to replace all word numbers within a list tag with corresponding number bullets. For example, with the following input:

<list>one goto school     two do play     three comeback      <!list>

I want the following output, but the matching should stop at the end of list:

<list>xx. goto school
|NEWLIN xx. do play
|NEWLIN xx. comeback

The regular expression suggested in the answer (also copied below) solves it but does not stop matching at the end of list.

((?<=\<list\>)|(?<=\|NEWLIN ))(one|two|three|four|five|six|seven|eight|nine)

I suggest matching the blocks in between <list> and <!list> with (?s)<list>.*?<!list> and then replace what you need in those specific positions.

Here is a sample solution that can be further improved:

import re
s = "<list>one goto school\n|NEWLIN two do play\n|NEWLIN three comeback\n <!list>"
def repl(m):
    l = {'one':'1', 'two':'2', 'three':'3', 'four':'4', 'five':'5', 'six':'6', 'seven':'7', 'eight':'8', 'nine':'9'}
    k = r"|".join([key for key, value in l.iteritems()])
    return re.sub(r"(?:(?<=<list>)|(?<=\|NEWLIN ))(?:{})".format(k), lambda x: "{}.".format(l[x.group()]), m.group())

res = re.sub(r"(?s)<list>.*?<!list>", repl, s)

See the Python demo

Details :

  • The (?s)<list>.*?<!list> regex matches <list> , then any 0+ chars (as (?s) modifier lets . match any char incl. line break chars) and then <!list>
  • In the re.sub , the callback repl method is passed, where the match object is processed
  • Inside repl method, the dictionary with the necessary replacements is defined, the keys are used to create a regex with alternations and two lookbehinds (this can easily be changed into capturing groups, but the code will grow a tiny bit longer). Within the re.sub , the lambda is passed as the replacement, and it lets us use the match value to fetch the right dictionary value.

