简体   繁体   中英

Extracting multiple occurrences between 2 delimiters in a line

How do we generically get the numbers into a list ? The delimiters may be "(" and ")", and it can be "[" and "]" or "{" and "}", or even "start" and "end", etc.

line = "-(123) = (456) = (789)-"

result = re.findall(r"\([^']*\)", line)

for i in result:
    print(i)

The numbers or any contents between the 2 delimiters are what we want to put in a list.

What you have here is a greedy match -- the * will match as many characters as possible, from the first ( to the last ) , giving just one large match.

Use a non-greedy match instead: \\([^']*?\\)

If you want to skip the delimiters, use capturing parens: \\(([^']*?)\\)

Regex101 link: https://regex101.com/r/5wYz7v/1

I submit there is only 1 easy way to do this; a two step process.

>>> import re
>>> line = r' [one] (two) {three} startfourend '
>>> ary = re.findall( r'(\([^)]*\)|\[[^\]]*\]|{[^}]*}|start(?:(?!end)[\S\s])*end)', line)
>>> ary = [ re.sub(r'^(?:[\[({]|start)|(?:[\])}]|end)$', '', element) for element in ary ]
>>> print (ary)
['one', 'two', 'three', 'four']

Regex for findall - to find all the elements

 (                             # (1 start)
      \( [^)]* \)
   |  
      \[ [^\]]* \]
   |  
      { [^}]* }
   |  
      start  
      (?:
           (?! end )
           [\S\s]    
      )*
      end
 )                             # (1 end)

Regex for sub - trimming the array elements

    ^ 
    (?: [\[({] | start )
 |  
    (?: [\])}] | end )
    $

Note that if you desire whitespace trimming on the elements
change the regex to this

    ^ 
    (?: [\[({] | start )
    \s* 
 |  
    \s* 
    (?: [\])}] | end )
    $

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM