简体   繁体   中英

Scala regex pattern matching with String Interpolation

From Scala 2.10 we can define new method r using StringContext like this:

implicit class RegexContext(sc: StringContext) {
  def r = new Regex(sc.parts.mkString, sc.parts.tail.map(_ => "x"): _*)
}

Then we can easily define regex pattern after case keyword like this:

"123" match { 
   case r"\d+" => true 
   case _ => false 
}

Which is not clear to me how the implementation inside of the implicit class RegexContext works

Can someone explain to me the implementation of the method r , especially sc.parts.tail.map(_ => "x"): _* ?

The implementation is taken from How to pattern match using regular expression in Scala?

Those args are group names, not very useful here.

scala 2.13.0-M5> implicit class R(sc: StringContext) { def r = sc.parts.mkString.r }
defined class R

scala 2.13.0-M5> "hello" match { case r"hell.*" => }

Compare:

scala 2.13.0-M5> implicit class R(sc: StringContext) { def r = sc.parts.mkString("(.*)").r }
defined class R

scala 2.13.0-M5> "hello" match { case r"hell$x" => x }
res5: String = o

The Regex constructor takes two arguments.

new Regex (regex: String, groupNames: String*)

The groupNames parameter is a vararg so it (they) are actually optional and, in this case, it should have been left empty because that groupNames code is pretty useless.

Let's review what groupNames is supposed to do. We'll start without groupNames .

val rx = new Regex("~(A(.)C)~")  // pattern with 2 groups, no group names
rx.findAllIn("~ABC~").group(0) //res0: String = ~ABC~
rx.findAllIn("~ABC~").group(1) //res1: String = ABC
rx.findAllIn("~ABC~").group(2) //res2: String = B
rx.findAllIn("~ABC~").group(3) //java.lang.IndexOutOfBoundsException: No group 3

And now with groupNames .

val rx = new Regex("~(A(.)C)~", "x", "y", "z")  // 3 groups named
rx.findAllIn("~ABC~").group("x") //res0: String = ABC
rx.findAllIn("~ABC~").group("y") //res1: String = B
rx.findAllIn("~ABC~").group("z") //java.lang.IndexOutOfBoundsException: No group 3

So why is sc.parts.tail.map(_ => "x"): _* so useless? First because the number of names created is unrelated to the number of groups in the pattern, but also because it uses the same string, "x" , for every name it specifies. That name will only be good for the last group named.

val rx = new Regex("~(A(.)C)~", "x", "x")  // 2 groups named
rx.findAllIn("~ABC~").group("x") //res0: String = B (i.e. group(2))

...and...

val rx = new Regex("~(A(.)C)~", "x", "x", "x")  // 3 groups named
rx.findAllIn("~ABC~").group("x") //java.lang.IndexOutOfBoundsException: No group 3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM