I am trying to extract a substring between two set of patterns using re.search()
.
On the left, there can be either 0x
or 0X
, and on the right there can be either U
,
, or \n
. The result should not contain boundary patterns. For example, 0x1234U
should result in 1234
.
I tried with the following search pattern: (0x|0X)(.*)(U| |\n)
, but it includes the left and right patterns in the result.
What would be the correct search pattern?
You could use a combination of lookbehind and lookahead with a non-greedy match pattern in between:
import re
pattern = r"(?<=0[xX])(.*?)(?=[U\s\n])"
re.findall(pattern,"---0x1234U...0X456a ")
['1234', '456a']
You could use also use a single group using .group(1)
0[xX](.*?)[U\s]
The pattern matches:
0[xX]
Match either 0x
or 0X
(.*?)
Capture in group 1 matching any character except a newline, as least as possible [U\s]
Match either U
or a whitespace characters (which could also match a newline) import re
s = r"0x1234U"
pattern = r"0[xX](.*?)[U\s]"
m = re.search(pattern, s)
if m:
print(m.group(1))
Output
1234
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.