简体   繁体   中英

Python Regex Matching Two Strings If another String not Between

I want to search AA*ZZ only if * does not contain XX .

For 2 strings:

"IY**AA**BMDHRPONWUY**ZZ**"
"BV**AA**BDMYB**XX**W**ZZ**CKU"

how can I match regex only with the first one?

If you only want to match characters AZ, you might use

AA(?:[A-WYZ]|X(?!X))*ZZ

Explanation

  • AA Match literally
  • (?:
    • [A-WYZ] Match AZ except X
    • | or
    • X(?!X) Match X and assert what is directly to the right is not X
  • )* Close non capturing group and repeat 0+ times
  • ZZ Match literally

Regex demo

If there also can be other characters another option could be to use a negated character class [^\\sX] matching any char except X or a whitespace char:

AA(?:[^\sX]|X(?!X))*ZZ

Regex demo

Another option is to use a tempered greedy token:

AA(?:(?!\btest\b).)*BB

Regex demo

Posting my original comment to the question as an answer

Apart from "single-regex" solutions already posted, think about this solution:

  1. First, find all matches for any text between AA and ZZ , for example with this regex: AA(.+)ZZ . Store all matches in a list.
  2. Loop through (or use filter functions, if available) the list of matches from previous steps and remove the ones that do not contain XX . You do not even need to use Regex for that, as most languages, including Python, have dedicated string methods for that.

What you get in return is a clean solution, without any complicated Regexes. It's easy to read, easy to maintain, and if any new conditions are to be added they can be applied at the final result.

To support it with some code ( you can test it here ):

import re


test_str = """
IYAABMDHRPONWUYZZ
BVAABDMYBXXWZZCKU
"""

# First step: find all strings between AA and ZZ
match_results = re.findall("AA(.+)ZZ", test_str, re.I)

# Second step: filter out the ones that contain XX
final_results = [match for match in match_results if not ("XX" in match)]

print(final_results)

As for the part assigned to final_results , it's called list comprehension. Since it's not part of the question, I'll not explain it here.

My guess is that you might probably, not sure though, want to design an expression similar to:

^(?!.*(?=AA.*XX.*ZZ).*).*AA.*ZZ.*$

Test

import re

regex = r"^(?!.*(?=AA.*XX.*ZZ).*).*AA.*ZZ.*$"

test_str = """
IYAABMDHRPONWUYZZ
BVAABDMYBXXWZZCKU
AABMDHRPONWUYXxXxXxZZ
"""

print(re.findall(regex, test_str, re.M))

Output

['IYAABMDHRPONWUYZZ', 'AABMDHRPONWUYXxXxXxZZ']

The expression is explained on the top right panel of regex101.com , if you wish to explore/simplify/modify it, and in this link , you can watch how it would match against some sample inputs, if you like.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM