简体   繁体   中英

Overlapping matches in Regex - Scala

I'm trying to extract all posible combinations of 3 letters from a String following the pattern XYX.

val text = "abaca dedfd ghgig"
val p = """([a-z])(?!\1)[a-z]\1""".r
p.findAllIn(text).toArray

When I run the script I get:

aba, ded, ghg

And it should be:

aba, aca, ded, dfd, ghg, gig

It does not detect overlapped combinations.

The way consists to enclose the whole pattern in a lookahead to consume only the start position:

val p = """(?=(([a-z])(?!\2)[a-z]\2))""".r
p.findAllIn(text).matchData foreach {
   m => println(m.group(1))
}

The lookahead is only an assertion (a test) for the current position and the pattern inside doesn't consume characters. The result you are looking for is in the first capture group (that is needed to get the result since the whole match is empty).

You need to capture the whole pattern and put it inside a positive lookahead. The code in Scala will be the following:

object Main extends App {
    val text = "abaca dedfd ghgig"
    val p = """(?=(([a-z])(?!\2)[a-z]\2))""".r
    val allMatches = p.findAllMatchIn(text).map(_.group(1))
    println(allMatches.mkString(", "))
    // => aba, aca, ded, dfd, ghg, gig
}

See the online Scala demo

Note that the backreference will turn to \\2 as the group to check will have ID = 2 and Group 1 will contain the value you need to collect.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM