简体   繁体   中英

Extract a string between two set of patterns in Python

I am trying to extract a substring between two set of patterns using re.search() .

On the left, there can be either 0x or 0X , and on the right there can be either U , , or \n . The result should not contain boundary patterns. For example, 0x1234U should result in 1234 .

I tried with the following search pattern: (0x|0X)(.*)(U| |\n) , but it includes the left and right patterns in the result.

What would be the correct search pattern?

You could use a combination of lookbehind and lookahead with a non-greedy match pattern in between:

import re
   
pattern = r"(?<=0[xX])(.*?)(?=[U\s\n])"

re.findall(pattern,"---0x1234U...0X456a ")

['1234', '456a']

You could use also use a single group using .group(1)

0[xX](.*?)[U\s]

The pattern matches:

  • 0[xX] Match either 0x or 0X
  • (.*?) Capture in group 1 matching any character except a newline, as least as possible
  • [U\s] Match either U or a whitespace characters (which could also match a newline)

Regex demo | Python demo

import re

s = r"0x1234U"
pattern = r"0[xX](.*?)[U\s]"

m = re.search(pattern, s)
if m:
    print(m.group(1))

Output

1234

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM