简体   繁体   中英

Pattern matching extract String Scala

I want to extract part of a String that match one of the tow regex patterns i defined:

  //should match R0010, R0100,R0300 etc 
  val rPat="[R]{1}[0-9]{4}".r
  // should match P.25.01.21 , P.27.03.25 etc
  val pPat="[P]{1}[.]{1}[0-9]{2}[.]{1}[0-9]{2}[.]{1}[0-9]{2}".r 

When I now define my method to extract the elements as:

  val matcher= (s:String) => s match {case pPat(el)=> println(el) // print the P.25.01.25
                                        case rPat(el)=>println(el) // print R0100 
                                        case _ => println("no match")}

And test it eg with:

  val pSt=" P.25.01.21 - Hello whats going on?"
  matcher(pSt)//prints "no match" but should print P.25.01.21
  val rSt= "R0010  test test 3,870" 
  matcher(rSt) //prints also "no match" but should print R0010
  //check if regex is wrong
  val pHead="P.25.01.21"
  pHead.matches(pPat.toString)//returns true
  val rHead="R0010"
  rHead.matches(rPat.toString)//return true

I'm not sure if the regex expression are wrong but the matches method works on the elements. So what is wrong with the approach?

When you use pattern matching with strings, you need to bear in mind that:

  • The .r pattern you pass will need to match the whole string, else, no match will be returned (the solution is to make the pattern .r.unanchored )
  • Once you make it unanchored, watch out for unwanted matches: R[0-9]{4} will match R1234 in CSR123456 (solutions are different depending on what your real requirements are, usually word boundaries \\b are enough, or negative lookarounds can be used)
  • Inside a match block, the regex matching function requires a capturing group to be present if you want to get some value back (you defined it as el in pPat(el) and rPat(el) .

So, I suggest the following solution :

val rPat="""\b(R\d{4})\b""".r.unanchored
val pPat="""\b(P\.\d{2}\.\d{2}\.\d{2})\b""".r.unanchored

val matcher= (s:String) => s match {case pPat(el)=> println(el) // print the P.25.01.25
    case rPat(el)=>println(el) // print R0100 
    case _ => println("no match")
}

Then,

val pSt=" P.25.01.21 - Hello whats going on?"
matcher(pSt) // => P.25.01.21
val pSt2_bad=" CP.2334565.01124.212 - Hello whats going on?"
matcher(pSt2_bad) // => no match
val rSt= "R0010  test test 3,870" 
matcher(rSt) // => R0010
val rSt2_bad = "CSR00105  test test 3,870" 
matcher(rSt2_bad) // => no match

Some notes on the patterns:

  • \\b - a leading word boundary
  • (R\\d{4}) - a capturing group matching exactly 4 digits
  • \\b - a trailing word boundary

Due to the triple quotes used to define the string literal, there is no need to escape the backslashes.

Introduce groups in your patterns:

val rPat=".*([R]{1}[0-9]{4}).*".r

val pPat=".*([P]{1}[.]{1}[0-9]{2}[.]{1}[0-9]{2}[.]{1}[0-9]{2}).*".r 

...

scala> matcher(pSt)
P.25.01.21

scala> matcher(rSt)
R0010

If code is written in the following way, the desired outcome will be generated. Reference API documentation followed is http://www.scala-lang.org/api/2.12.1/scala/util/matching/Regex.html

  //should match R0010, R0100,R0300 etc
  val rPat="[R]{1}[0-9]{4}".r
  // should match P.25.01.21 , P.27.03.25 etc
  val pPat="[P]{1}[.]{1}[0-9]{2}[.]{1}[0-9]{2}[.]{1}[0-9]{2}".r


  def main(args: Array[String]) {
    val pSt=" P.25.01.21 - Hello whats going on?"
    val pPatMatches = pPat.findAllIn(pSt);
    pPatMatches.foreach(println)
    val rSt= "R0010  test test 3,870"
    val rPatMatches = rPat.findAllIn(rSt);
    rPatMatches.foreach(println)

  }

Please, let me know if that works for you.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM