简体   繁体   English

重叠匹配的正则表达式

[英]Regex for overlapping matches

For a linguistics project I am trying to match all occurrences of one or two consonants between vowels in some text. 对于语言学项目,我试图在某些文本中匹配元音之间所有出现的一个或两个辅音。 I am trying to write a very simple matcher in PHP ( preg_match_all ), but once the match is consumed, it cannot match again. 我试图在PHP( preg_match_all )中编写一个非常简单的匹配器,但是一旦匹配被消耗,它就无法再次匹配。

The following is very simple and should do the trick, but only matches the first occurrence: 以下是非常简单的,应该做的伎俩,但只匹配第一次出现:

[aeiou](qu|[bcdfghjklmnprstvwxyz]{1,2})[aeiou]

In: officiosior : offi and osi are returned, but not ici because the trailing i is the first part of the match in the second match. 在: officiosioroffiosi返回,但不是ici因为后行i是匹配中的第二场比赛的第一部分。

As far as I can tell, it's impossible to do, but is there a decent way to work around the issue? 据我所知,这是不可能的,但有没有一个体面的方法来解决这个问题?

You can use a Positive Lookahead assertion to achieve this. 您可以使用Positive Lookahead断言来实现此目的。

(?=([aeiou](?:qu|[^aeiou]{1,2})[aeiou]))

A lookahead does not consume any characters on the string. 前瞻不会消耗字符串上的任何字符。 After looking, the regular expression engine is back at the same position on the string from where it started looking. 在查看之后,正则表达式引擎返回到它开始查看的字符串上的相同位置。 From there, it can start matching again... 从那里,它可以再次开始匹配......

Explanation : 说明

(?=                    # look ahead to see if there is:
  (                    #   group and capture to \1:
    [aeiou]            #     any character of: 'a', 'e', 'i', 'o', 'u'
    (?:                #     group, but do not capture:
      qu               #       'qu'
     |                 #      OR
      [^aeiou]{1,2}    #       any character except: 'a', 'e', 'i', 'o', 'u' 
                       #       (between 1 and 2 times)
    )                  #     end of grouping
    [aeiou]            #     any character of: 'a', 'e', 'i', 'o', 'u'
  )                    #   end of \1
)                      # end of look-ahead

Working Demo 工作演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM