简体   繁体   中英

Regex matching end of list element

My aim is to replace all word numbers within a list tag with corresponding number bullets. For example, with the following input:

<list>one goto school     two do play     three comeback      <!list>

I want the following output, but the matching should stop at the end of list:

<list>xx. goto school
|NEWLIN xx. do play
|NEWLIN xx. comeback
 <!list>    

The regular expression suggested in the answer (also copied below) solves it but does not stop matching at the end of list.

((?<=\<list\>)|(?<=\|NEWLIN ))(one|two|three|four|five|six|seven|eight|nine)

I suggest matching the blocks in between <list> and <!list> with (?s)<list>.*?<!list> and then replace what you need in those specific positions.

Here is a sample solution that can be further improved:

import re
s = "<list>one goto school\n|NEWLIN two do play\n|NEWLIN three comeback\n <!list>"
def repl(m):
    l = {'one':'1', 'two':'2', 'three':'3', 'four':'4', 'five':'5', 'six':'6', 'seven':'7', 'eight':'8', 'nine':'9'}
    k = r"|".join([key for key, value in l.iteritems()])
    return re.sub(r"(?:(?<=<list>)|(?<=\|NEWLIN ))(?:{})".format(k), lambda x: "{}.".format(l[x.group()]), m.group())

res = re.sub(r"(?s)<list>.*?<!list>", repl, s)
print(res)

See the Python demo

Details :

  • The (?s)<list>.*?<!list> regex matches <list> , then any 0+ chars (as (?s) modifier lets . match any char incl. line break chars) and then <!list>
  • In the re.sub , the callback repl method is passed, where the match object is processed
  • Inside repl method, the dictionary with the necessary replacements is defined, the keys are used to create a regex with alternations and two lookbehinds (this can easily be changed into capturing groups, but the code will grow a tiny bit longer). Within the re.sub , the lambda is passed as the replacement, and it lets us use the match value to fetch the right dictionary value.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM