简体   繁体   中英

regular expression match issue in Python

For input string, want to match text which starts with {(P) and ends with (P)} , and I just want to match the parts in the middle. Wondering if we can write one regular expression to resolve this issue?

For example, in the following example, for the input string, I want to retrieve hello world part. Using Python 2.7.

python {(P)hello world(P)} java

You can try {\\(P\\)(.*)\\(P\\)} , and use parenthesis in the pattern to capture everything between {(P) and (P)} :

import re
re.findall(r'{\(P\)(.*)\(P\)}', "python {(P)hello world(P)} java")

# ['hello world']

.* also matches unicode characters, for example:

import re
str1 = "python {(P)£1,073,142.68(P)} java"
str2 = re.findall(r'{\(P\)(.*)\(P\)}', str1)[0]

str2
# '\xc2\xa31,073,142.68'

print str2
# £1,073,142.68

You can use positive look-arounds to ensure that it only matches if the text is preceded and followed by the start and end tags. For instance, you could use this pattern:

(?<={\(P\)).*?(?=\(P\)})

See the demo .

  • (?<={\\(P\\)) - Look-behind expression stating that a match must be preceded by {(P) .
  • .*? - Matches all text between the start and end tags. The ? makes the star lazy (ie non-greedy). That means it will match as little as possible.
  • (?=\\(P\\)}) - Look-ahead expression stating that a match must be followed by (P)} .

For what it's worth, lazy patterns are technically less efficient, so if you know that there will be no ( characters in the match, it would be better to use a negative character class:

(?<={\(P\))[^(]*(?=\(P\)})

You can also do this without regular expressions:

s = 'python {(P)hello world(P)} java'
r = s.split('(P)')[1]
print(r)
# 'hello world'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM