如何创建一个找到第一个内部对词的正则表达式？

Question

I would like to make a regular expression which capture the first inner pair words. 我想制作一个捕获第一个内部对词的正则表达式。 My code following works in a condition but it does not work in another. 我的代码在一个条件下工作但在另一个条件下不起作用。 It captures the last pair words. 它捕获最后一对单词。

Please see the my code below. 请参阅下面的代码。

def testReplaceBetweenWords():

    head_dlmt='Head'
    tail_dlmt='Tail'

    line0 = "abc_Head_def_Head_inner_inside_Tail_ghi_Tail_jkl"
    line1 = "abc_Head_first_Tail_ghi_Head_second_Tail_opq"

    between_pattern = "(^.*(?<={0}))(?!.*{0}).*?(?={1})(.*)$".format(head_dlmt, tail_dlmt)
    compiled_pattern = re.compile(between_pattern)

    # Case 0: good case: It captures the first inner words.    
    result0 = re.search(compiled_pattern, line0)  

    print("original 0    : {0}".format(result0.group(0)))
    print("expected Head : abc_Head_def_Head")
    print("found Head    : {0}".format(result0.group(1)))
    print("expected Tail :                                Tail_ghi_Tail_jkl")
    print("found Tail    : {0}{1}".format(' ' * (result0.regs[2][0]), result0.group(2)))

    print()

    # Case 1: Bad case: It captures the last pair words.    
    result1 = re.search(compiled_pattern, line1)

    print("original 1    : {0}".format(result1.group(0)))
    print("expected Head : abc_Head")
    print("found Head    : {0}".format(result1.group(1)))
    print("expected Tail :                Tail_ghi_Head_second_Tail_opq")
    print("found Tail    : {0}{1}".format(' ' * (result1.regs[2][0]), result1.group(2)))

And the outputs are following. 产出如下。

original 0    : abc_Head_def_Head_inner_inside_Tail_ghi_Tail_jkl
expected Head : abc_Head_def_Head
found Head    : abc_Head_def_Head
expected Tail :                                Tail_ghi_Tail_jkl
found Tail    :                                Tail_ghi_Tail_jkl

original 1    : abc_Head_first_Tail_ghi_Head_second_Tail_opq
expected Head : abc_Head
found Head    : abc_Head_first_Tail_ghi_Head
expected Tail :                Tail_ghi_Head_second_Tail_opq
found Tail    :                                     Tail_opq

The first case works well. 第一种情况很好。 It captures the first inner pair words. 它捕获了第一个内部对词。 The second case does not work. 第二种情况不起作用。 It captures the last pair words but I expected the first pair words. 它捕获了最后一对单词，但我预计第一对单词。 How can I make a regular express which satisfies the two cases above? 如何制作满足上述两种情况的常规快递？

Thank you very much. 非常感谢你。

Answer 1

Use the following regex: 使用以下正则表达式：

between_pattern = "^((?:(?!{1}).)*{0}).*?({1}.*)$".format(head_dlmt, tail_dlmt)

See the online Python demo and the regex demo . 查看在线Python演示和正则表达式演示 。

Details 细节

The first .* pattern should be replace with a tempered greedy token (?:(?!{1}).)* that matches any 0+ chars that do not start the end delimiter character sequence (thus, you may up to the last Head that contains no Tail ) 第一个.*模式应该用一个驯化的贪婪令牌(?:(?!{1}).)*替换(?:(?!{1}).)*匹配任何不启动结束分隔符字符序列的0+字符（因此，你可以直到最后一个Head没有Tail ）
There is no point using lookarounds inside capturing groups as these patterns will be part of those capturing groups 在捕获组中使用外观是没有意义的，因为这些模式将成为捕获组的一部分

Note you may want to compile the regex with re.S flag to support strings with line breaks. 请注意，您可能希望使用re.S标志编译正则表达式以支持带换行符的字符串。

Answer 2

Another option could be just match ( almost ) exactly what you want to match: 另一种选择可能恰好匹配（几乎）你想要匹配的东西：

use this regex, and extract the first match: 使用此正则表达式，并提取第一个匹配：

(?<=Head)(?:(?!Head|Tail).)+(?=Tail)

in your case, use: 在您的情况下，使用：

between_pattern = '(?<={0})(?:(?!{0}|{1}).)+(?={1})'.format(head_dlmt, tail_dlmt)

Even more: with this regex, you can extract the second, the third... the nth, just as easy as extract the first, and without any changes at all: it's more flexible. 更多：使用这个正则表达式，你可以提取第二个，第三个......第n个，就像提取第一个一样容易，而且根本没有任何变化：它更灵活。

see here: 看这里：

https://regex101.com/r/ds90y4/1/ https://regex101.com/r/ds90y4/1/

如何创建一个找到第一个内部对词的正则表达式？

问题描述

2 个解决方案

解决方案1
3 已采纳 2018-05-04 08:19:43

解决方案2
0 2018-05-04 09:21:46

如何创建一个找到第一个内部对词的正则表达式？

问题描述

2 个解决方案

解决方案1 3 已采纳 2018-05-04 08:19:43

解决方案2 0 2018-05-04 09:21:46

解决方案1
3 已采纳 2018-05-04 08:19:43

解决方案2
0 2018-05-04 09:21:46