简体   繁体   中英

Python regex to find string which ends with A OR B, either if A comes first or B (non-greedy search)

I have the following python code
data = re.sub("{.{4,9}b .*?[\\r\\n]*?.*? ((.*\\\\()|(..fs24))",string,re.DOTALL)

I want to be able to have two matches for each of the strings below

?{\f1\fs24\b \u1510 ?}\u1489 ?\fs24

and

?{\f1\fs24\b \u1492 ?}\u1494 ?(

In both permutations of the both

?{\f1\fs24\b \u1492 ?}\u1494 ?(  ?{\f1\fs24\b \u1510 ?}\u1489 ?\fs24
?{\f1\fs24\b \u1510 ?}\u1489 ?\fs24 ?{\f1\fs24\b \u1492 ?}\u1494 ?(

However, the OR operator is greedy. So it will always try to feed the first operand and so in both cases it will consume the whole string and give me only one match...

It took me some time to understand the greediness... To solve I was playing with positive lookahead assumptions. I was also trying to do two separate searches but the greediness always wins...

I don't know what you're trying to do exactly, but if you want to stop at the first ( or ..fs24 , then you need to use a negative lookahead to check each . match is being matched.

data = re.search(r"{.{4,9}b .*?[\r\n]*?.*? ((?:(?!\(| ..fs24).)*)", string, re.DOTALL)
                                            ^^^^^^^^^^^^^^^^^^^^

If you're matching, you'll need re.search (or re.findall for multiple matches in a whoop). You'll need two input strings anyway with re.sub .

Notes:

  • Raw your regex strings to avoid having to double escape metacharacters.
  • .*?[\\r\\n]*?.*? : this part doesn't quite seem useful to me, but I left it there since I don't know what you're trying to do besides stopping at the first ( or ..fs24 .

regex demo for the last string where you get 2 matches instead of 1.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM