[英]Python Regex Matching Two Strings If another String not Between
I want to search AA*ZZ
only if *
does not contain XX
.我只想在
*
不包含XX
搜索AA*ZZ
。
For 2 strings:对于 2 个字符串:
"IY**AA**BMDHRPONWUY**ZZ**"
"BV**AA**BDMYB**XX**W**ZZ**CKU"
how can I match regex only with the first one?如何仅将正则表达式与第一个匹配?
If you only want to match characters AZ, you might use如果您只想匹配字符 AZ,您可以使用
AA(?:[A-WYZ]|X(?!X))*ZZ
Explanation解释
AA
Match literally AA
匹配字面意思(?:
[A-WYZ]
Match AZ except X [A-WYZ]
匹配除 X 之外的 AZ|
orX(?!X)
Match X and assert what is directly to the right is not X X(?!X)
匹配 X 并断言右边的不是 X)*
Close non capturing group and repeat 0+ times )*
关闭非捕获组并重复 0+ 次ZZ
Match literally ZZ
字面上匹配If there also can be other characters another option could be to use a negated character class [^\\sX]
matching any char except X or a whitespace char:如果还有其他字符,另一种选择是使用否定字符类
[^\\sX]
匹配除 X 或空白字符以外的任何字符:
AA(?:[^\sX]|X(?!X))*ZZ
Another option is to use a tempered greedy token:另一种选择是使用温和的贪婪令牌:
AA(?:(?!\btest\b).)*BB
Posting my original comment to the question as an answer发布我对问题的原始评论作为答案
Apart from "single-regex" solutions already posted, think about this solution:除了已经发布的“单一正则表达式”解决方案之外,请考虑以下解决方案:
AA
and ZZ
, for example with this regex: AA(.+)ZZ
.AA
和ZZ
之间任何文本的所有匹配项,例如使用此正则表达式: AA(.+)ZZ
。 Store all matches in a list.XX
.XX
的那些。 You do not even need to use Regex for that, as most languages, including Python, have dedicated string methods for that. What you get in return is a clean solution, without any complicated Regexes.你得到的回报是一个干净的解决方案,没有任何复杂的正则表达式。 It's easy to read, easy to maintain, and if any new conditions are to be added they can be applied at the final result.
它易于阅读,易于维护,如果要添加任何新条件,它们可以应用于最终结果。
To support it with some code ( you can test it here ):用一些代码来支持它(你可以在这里测试):
import re
test_str = """
IYAABMDHRPONWUYZZ
BVAABDMYBXXWZZCKU
"""
# First step: find all strings between AA and ZZ
match_results = re.findall("AA(.+)ZZ", test_str, re.I)
# Second step: filter out the ones that contain XX
final_results = [match for match in match_results if not ("XX" in match)]
print(final_results)
As for the part assigned to final_results
, it's called list comprehension.至于分配给
final_results
的部分,它被称为列表理解。 Since it's not part of the question, I'll not explain it here.由于这不是问题的一部分,我不会在这里解释。
My guess is that you might probably, not sure though, want to design an expression similar to:我的猜测是,您可能(但不确定)想要设计一个类似于以下内容的表达式:
^(?!.*(?=AA.*XX.*ZZ).*).*AA.*ZZ.*$
import re
regex = r"^(?!.*(?=AA.*XX.*ZZ).*).*AA.*ZZ.*$"
test_str = """
IYAABMDHRPONWUYZZ
BVAABDMYBXXWZZCKU
AABMDHRPONWUYXxXxXxZZ
"""
print(re.findall(regex, test_str, re.M))
['IYAABMDHRPONWUYZZ', 'AABMDHRPONWUYXxXxXxZZ']
The expression is explained on the top right panel of regex101.com , if you wish to explore/simplify/modify it, and in this link , you can watch how it would match against some sample inputs, if you like.该表达式在regex101.com 的右上角面板中进行了解释,如果您希望探索/简化/修改它,并且在此链接中,您可以观看它如何与某些示例输入匹配,如果您愿意的话。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.