简体   繁体   English

如何创建一个找到第一个内部对词的正则表达式?

[英]How can I make a regular expression which finds the first inner pair words?

I would like to make a regular expression which capture the first inner pair words. 我想制作一个捕获第一个内部对词的正则表达式。 My code following works in a condition but it does not work in another. 我的代码在一个条件下工作但在另一个条件下不起作用。 It captures the last pair words. 它捕获最后一对单词。

Please see the my code below. 请参阅下面的代码。

def testReplaceBetweenWords():

    head_dlmt='Head'
    tail_dlmt='Tail'

    line0 = "abc_Head_def_Head_inner_inside_Tail_ghi_Tail_jkl"
    line1 = "abc_Head_first_Tail_ghi_Head_second_Tail_opq"

    between_pattern = "(^.*(?<={0}))(?!.*{0}).*?(?={1})(.*)$".format(head_dlmt, tail_dlmt)
    compiled_pattern = re.compile(between_pattern)

    # Case 0: good case: It captures the first inner words.    
    result0 = re.search(compiled_pattern, line0)  

    print("original 0    : {0}".format(result0.group(0)))
    print("expected Head : abc_Head_def_Head")
    print("found Head    : {0}".format(result0.group(1)))
    print("expected Tail :                                Tail_ghi_Tail_jkl")
    print("found Tail    : {0}{1}".format(' ' * (result0.regs[2][0]), result0.group(2)))

    print()

    # Case 1: Bad case: It captures the last pair words.    
    result1 = re.search(compiled_pattern, line1)

    print("original 1    : {0}".format(result1.group(0)))
    print("expected Head : abc_Head")
    print("found Head    : {0}".format(result1.group(1)))
    print("expected Tail :                Tail_ghi_Head_second_Tail_opq")
    print("found Tail    : {0}{1}".format(' ' * (result1.regs[2][0]), result1.group(2)))

And the outputs are following. 产出如下。

original 0    : abc_Head_def_Head_inner_inside_Tail_ghi_Tail_jkl
expected Head : abc_Head_def_Head
found Head    : abc_Head_def_Head
expected Tail :                                Tail_ghi_Tail_jkl
found Tail    :                                Tail_ghi_Tail_jkl

original 1    : abc_Head_first_Tail_ghi_Head_second_Tail_opq
expected Head : abc_Head
found Head    : abc_Head_first_Tail_ghi_Head
expected Tail :                Tail_ghi_Head_second_Tail_opq
found Tail    :                                     Tail_opq

The first case works well. 第一种情况很好。 It captures the first inner pair words. 它捕获了第一个内部对词。 The second case does not work. 第二种情况不起作用。 It captures the last pair words but I expected the first pair words. 它捕获了最后一对单词,但我预计第一对单词。 How can I make a regular express which satisfies the two cases above? 如何制作满足上述两种情况的常规快递?

Thank you very much. 非常感谢你。

Use the following regex: 使用以下正则表达式:

between_pattern = "^((?:(?!{1}).)*{0}).*?({1}.*)$".format(head_dlmt, tail_dlmt)

See the online Python demo and the regex demo . 查看在线Python演示正则表达式演示

Details 细节

  • The first .* pattern should be replace with a tempered greedy token (?:(?!{1}).)* that matches any 0+ chars that do not start the end delimiter character sequence (thus, you may up to the last Head that contains no Tail ) 第一个.*模式应该用一个驯化的贪婪令牌(?:(?!{1}).)*替换(?:(?!{1}).)*匹配任何不启动结束分隔符字符序列的0+字符(因此,你可以直到最后一个Head没有Tail
  • There is no point using lookarounds inside capturing groups as these patterns will be part of those capturing groups 在捕获组中使用外观是没有意义的,因为这些模式将成为捕获组的一部分

Note you may want to compile the regex with re.S flag to support strings with line breaks. 请注意,您可能希望使用re.S标志编译正则表达式以支持带换行符的字符串。

Another option could be just match ( almost ) exactly what you want to match: 另一种选择可能恰好匹配( 几乎 )你想要匹配的东西:

use this regex, and extract the first match: 使用此正则表达式,并提取第一个匹配:

(?<=Head)(?:(?!Head|Tail).)+(?=Tail)

in your case, use: 在您的情况下,使用:

between_pattern = '(?<={0})(?:(?!{0}|{1}).)+(?={1})'.format(head_dlmt, tail_dlmt)

Even more: with this regex, you can extract the second, the third... the nth, just as easy as extract the first, and without any changes at all: it's more flexible. 更多:使用这个正则表达式,你可以提取第二个,第三个......第n个,就像提取第一个一样容易,而且根本没有任何变化:它更灵活。

see here: 看这里:

https://regex101.com/r/ds90y4/1/ https://regex101.com/r/ds90y4/1/

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 “配对词提取器” - 正则表达式 - “Pair words Extractor” - Regular Expression 我该如何在python中编写一个正则表达式,该正则表达式在字符串的第一个句点停止,该字符串包含不可预测的字符类型? - How can I write a regular expression in python that stops at the first period in a string, which has unpredictable kinds of characters? 如何在Python中的正则表达式中间省略单词? - How can I omit words in the middle of a regular expression in Python? 如何建立包含选项部分的正则表达式 - How can I build a regular expression which has options part 如何制作一个执行深度优先搜索的函数,该函数在找到给定顶点时停止? - How can I make a function that performs a depth-first search that stops when it finds a given vertex? python:我怎样才能用正则表达式制作这段代码? - python: how can I make this code with regular expression? 如何建立一个正则表达式来捕获由单个空格分隔的单词? - How can I build a regular expression that captures words separated by single spaces? 什么正则表达式会匹配第一个和最后一个字母不同的单词? - What regular expression would match words for which the first and last letter are different? 如何按找到它们的顺序列出它找到的特定单词 - How do I list specific words it finds in the order it finds them in 如何用正则表达式“划分”单词? - How can I “divide” words with regular expressions?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM