简体   繁体   English

正则表达式中的重叠匹配-Scala

[英]Overlapping matches in Regex - Scala

I'm trying to extract all posible combinations of 3 letters from a String following the pattern XYX. 我正在尝试从遵循模式XYX的字符串中提取3个字母的所有可能组合。

val text = "abaca dedfd ghgig"
val p = """([a-z])(?!\1)[a-z]\1""".r
p.findAllIn(text).toArray

When I run the script I get: 当我运行脚本时,我得到:

aba, ded, ghg aba,ded,ghg

And it should be: 它应该是:

aba, aca, ded, dfd, ghg, gig aba,aca,ded,dfd,ghg,gig

It does not detect overlapped combinations. 它不会检测重叠的组合。

The way consists to enclose the whole pattern in a lookahead to consume only the start position: 该方法包括将整个模式以超前方式封闭,以仅消耗开始位置:

val p = """(?=(([a-z])(?!\2)[a-z]\2))""".r
p.findAllIn(text).matchData foreach {
   m => println(m.group(1))
}

The lookahead is only an assertion (a test) for the current position and the pattern inside doesn't consume characters. 前瞻仅是当前位置的断言(测试),并且内部的模式不占用字符。 The result you are looking for is in the first capture group (that is needed to get the result since the whole match is empty). 您要查找的结果位于第一个捕获组中(由于整个匹配为空,因此需要获取结果)。

You need to capture the whole pattern and put it inside a positive lookahead. 您需要捕获整个模式并将其置于正面的前瞻中。 The code in Scala will be the following: Scala中的代码如下:

object Main extends App {
    val text = "abaca dedfd ghgig"
    val p = """(?=(([a-z])(?!\2)[a-z]\2))""".r
    val allMatches = p.findAllMatchIn(text).map(_.group(1))
    println(allMatches.mkString(", "))
    // => aba, aca, ded, dfd, ghg, gig
}

See the online Scala demo 观看在线Scala演示

Note that the backreference will turn to \\2 as the group to check will have ID = 2 and Group 1 will contain the value you need to collect. 请注意,后向引用将变为\\2因为要检查的组的ID = 2,而第1组将包含您需要收集的值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM