[英]How can I make a regular expression which finds the first inner pair words?
I would like to make a regular expression which capture the first inner pair words. 我想制作一个捕获第一个内部对词的正则表达式。 My code following works in a condition but it does not work in another. 我的代码在一个条件下工作但在另一个条件下不起作用。 It captures the last pair words. 它捕获最后一对单词。
Please see the my code below. 请参阅下面的代码。
def testReplaceBetweenWords():
head_dlmt='Head'
tail_dlmt='Tail'
line0 = "abc_Head_def_Head_inner_inside_Tail_ghi_Tail_jkl"
line1 = "abc_Head_first_Tail_ghi_Head_second_Tail_opq"
between_pattern = "(^.*(?<={0}))(?!.*{0}).*?(?={1})(.*)$".format(head_dlmt, tail_dlmt)
compiled_pattern = re.compile(between_pattern)
# Case 0: good case: It captures the first inner words.
result0 = re.search(compiled_pattern, line0)
print("original 0 : {0}".format(result0.group(0)))
print("expected Head : abc_Head_def_Head")
print("found Head : {0}".format(result0.group(1)))
print("expected Tail : Tail_ghi_Tail_jkl")
print("found Tail : {0}{1}".format(' ' * (result0.regs[2][0]), result0.group(2)))
print()
# Case 1: Bad case: It captures the last pair words.
result1 = re.search(compiled_pattern, line1)
print("original 1 : {0}".format(result1.group(0)))
print("expected Head : abc_Head")
print("found Head : {0}".format(result1.group(1)))
print("expected Tail : Tail_ghi_Head_second_Tail_opq")
print("found Tail : {0}{1}".format(' ' * (result1.regs[2][0]), result1.group(2)))
And the outputs are following. 产出如下。
original 0 : abc_Head_def_Head_inner_inside_Tail_ghi_Tail_jkl
expected Head : abc_Head_def_Head
found Head : abc_Head_def_Head
expected Tail : Tail_ghi_Tail_jkl
found Tail : Tail_ghi_Tail_jkl
original 1 : abc_Head_first_Tail_ghi_Head_second_Tail_opq
expected Head : abc_Head
found Head : abc_Head_first_Tail_ghi_Head
expected Tail : Tail_ghi_Head_second_Tail_opq
found Tail : Tail_opq
The first case works well. 第一种情况很好。 It captures the first inner pair words. 它捕获了第一个内部对词。 The second case does not work. 第二种情况不起作用。 It captures the last pair words but I expected the first pair words. 它捕获了最后一对单词,但我预计第一对单词。 How can I make a regular express which satisfies the two cases above? 如何制作满足上述两种情况的常规快递?
Thank you very much. 非常感谢你。
Use the following regex: 使用以下正则表达式:
between_pattern = "^((?:(?!{1}).)*{0}).*?({1}.*)$".format(head_dlmt, tail_dlmt)
See the online Python demo and the regex demo . 查看在线Python演示和正则表达式演示 。
Details 细节
.*
pattern should be replace with a tempered greedy token (?:(?!{1}).)*
that matches any 0+ chars that do not start the end delimiter character sequence (thus, you may up to the last Head
that contains no Tail
) 第一个.*
模式应该用一个驯化的贪婪令牌(?:(?!{1}).)*
替换(?:(?!{1}).)*
匹配任何不启动结束分隔符字符序列的0+字符(因此,你可以直到最后一个Head
没有Tail
) Note you may want to compile the regex with re.S
flag to support strings with line breaks. 请注意,您可能希望使用re.S
标志编译正则表达式以支持带换行符的字符串。
Another option could be just match ( almost ) exactly what you want to match: 另一种选择可能恰好匹配( 几乎 )你想要匹配的东西:
use this regex, and extract the first match: 使用此正则表达式,并提取第一个匹配:
(?<=Head)(?:(?!Head|Tail).)+(?=Tail)
in your case, use: 在您的情况下,使用:
between_pattern = '(?<={0})(?:(?!{0}|{1}).)+(?={1})'.format(head_dlmt, tail_dlmt)
Even more: with this regex, you can extract the second, the third... the nth, just as easy as extract the first, and without any changes at all: it's more flexible. 更多:使用这个正则表达式,你可以提取第二个,第三个......第n个,就像提取第一个一样容易,而且根本没有任何变化:它更灵活。
see here: 看这里:
https://regex101.com/r/ds90y4/1/ https://regex101.com/r/ds90y4/1/
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.