简体   繁体   English

Python正则表达式匹配两个字符串,如果另一个字符串不在中间

[英]Python Regex Matching Two Strings If another String not Between

I want to search AA*ZZ only if * does not contain XX .我只想在*不包含XX搜索AA*ZZ

For 2 strings:对于 2 个字符串:

"IY**AA**BMDHRPONWUY**ZZ**"
"BV**AA**BDMYB**XX**W**ZZ**CKU"

how can I match regex only with the first one?如何仅将正则表达式与第一个匹配?

If you only want to match characters AZ, you might use如果您只想匹配字符 AZ,您可以使用

AA(?:[A-WYZ]|X(?!X))*ZZ

Explanation解释

  • AA Match literally AA匹配字面意思
  • (?:
    • [A-WYZ] Match AZ except X [A-WYZ]匹配除 X 之外的 AZ
    • | or或者
    • X(?!X) Match X and assert what is directly to the right is not X X(?!X)匹配 X 并断言右边的不是 X
  • )* Close non capturing group and repeat 0+ times )*关闭非捕获组并重复 0+ 次
  • ZZ Match literally ZZ字面上匹配

Regex demo正则表达式演示

If there also can be other characters another option could be to use a negated character class [^\\sX] matching any char except X or a whitespace char:如果还有其他字符,另一种选择是使用否定字符类[^\\sX]匹配除 X 或空白字符以外的任何字符:

AA(?:[^\sX]|X(?!X))*ZZ

Regex demo正则表达式演示

Another option is to use a tempered greedy token:另一种选择是使用温和的贪婪令牌:

AA(?:(?!\btest\b).)*BB

Regex demo正则表达式演示

Posting my original comment to the question as an answer发布我对问题的原始评论作为答案

Apart from "single-regex" solutions already posted, think about this solution:除了已经发布的“单一正则表达式”解决方案之外,请考虑以下解决方案:

  1. First, find all matches for any text between AA and ZZ , for example with this regex: AA(.+)ZZ .首先,查找AAZZ之间任何文本的所有匹配项,例如使用此正则表达式: AA(.+)ZZ Store all matches in a list.将所有匹配项存储在列表中。
  2. Loop through (or use filter functions, if available) the list of matches from previous steps and remove the ones that do not contain XX .循环(或使用过滤器函数,如果可用)前面步骤中的匹配列表并删除不包含XX的那些。 You do not even need to use Regex for that, as most languages, including Python, have dedicated string methods for that.您甚至不需要为此使用 Regex,因为包括 Python 在内的大多数语言都有专用的字符串方法。

What you get in return is a clean solution, without any complicated Regexes.你得到的回报是一个干净的解决方案,没有任何复杂的正则表达式。 It's easy to read, easy to maintain, and if any new conditions are to be added they can be applied at the final result.它易于阅读,易于维护,如果要添加任何新条件,它们可以应用于最终结果。

To support it with some code ( you can test it here ):用一些代码来支持它(你可以在这里测试):

import re


test_str = """
IYAABMDHRPONWUYZZ
BVAABDMYBXXWZZCKU
"""

# First step: find all strings between AA and ZZ
match_results = re.findall("AA(.+)ZZ", test_str, re.I)

# Second step: filter out the ones that contain XX
final_results = [match for match in match_results if not ("XX" in match)]

print(final_results)

As for the part assigned to final_results , it's called list comprehension.至于分配给final_results的部分,它被称为列表理解。 Since it's not part of the question, I'll not explain it here.由于这不是问题的一部分,我不会在这里解释。

My guess is that you might probably, not sure though, want to design an expression similar to:我的猜测是,您可能(但不确定)想要设计一个类似于以下内容的表达式:

^(?!.*(?=AA.*XX.*ZZ).*).*AA.*ZZ.*$

Test测试

import re

regex = r"^(?!.*(?=AA.*XX.*ZZ).*).*AA.*ZZ.*$"

test_str = """
IYAABMDHRPONWUYZZ
BVAABDMYBXXWZZCKU
AABMDHRPONWUYXxXxXxZZ
"""

print(re.findall(regex, test_str, re.M))

Output输出

['IYAABMDHRPONWUYZZ', 'AABMDHRPONWUYXxXxXxZZ']

The expression is explained on the top right panel of regex101.com , if you wish to explore/simplify/modify it, and in this link , you can watch how it would match against some sample inputs, if you like.该表达式在regex101.com 的右上角面板中进行了解释,如果您希望探索/简化/修改它,并且在此链接中,您可以观看它如何与某些示例输入匹配,如果您愿意的话。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM