简体   繁体   English

模式匹配提取String Scala

[英]Pattern matching extract String Scala

I want to extract part of a String that match one of the tow regex patterns i defined: 我想提取一个与我定义的两个正则表达式模式匹配的String的一部分:

  //should match R0010, R0100,R0300 etc 
  val rPat="[R]{1}[0-9]{4}".r
  // should match P.25.01.21 , P.27.03.25 etc
  val pPat="[P]{1}[.]{1}[0-9]{2}[.]{1}[0-9]{2}[.]{1}[0-9]{2}".r 

When I now define my method to extract the elements as: 当我现在定义我的方法来提取元素时:

  val matcher= (s:String) => s match {case pPat(el)=> println(el) // print the P.25.01.25
                                        case rPat(el)=>println(el) // print R0100 
                                        case _ => println("no match")}

And test it eg with: 并测试它,例如:

  val pSt=" P.25.01.21 - Hello whats going on?"
  matcher(pSt)//prints "no match" but should print P.25.01.21
  val rSt= "R0010  test test 3,870" 
  matcher(rSt) //prints also "no match" but should print R0010
  //check if regex is wrong
  val pHead="P.25.01.21"
  pHead.matches(pPat.toString)//returns true
  val rHead="R0010"
  rHead.matches(rPat.toString)//return true

I'm not sure if the regex expression are wrong but the matches method works on the elements. 我不确定正则表达式表达式是否错误,但匹配方法是否适用于元素。 So what is wrong with the approach? 那么这种方法有什么问题呢?

When you use pattern matching with strings, you need to bear in mind that: 当您使用模式匹配字符串时,您需要记住:

  • The .r pattern you pass will need to match the whole string, else, no match will be returned (the solution is to make the pattern .r.unanchored ) 您传递的.r模式需要匹配整个字符串,否则,将不返回任何匹配项(解决方案是创建模式.r.unanchored
  • Once you make it unanchored, watch out for unwanted matches: R[0-9]{4} will match R1234 in CSR123456 (solutions are different depending on what your real requirements are, usually word boundaries \\b are enough, or negative lookarounds can be used) 一旦你进行了无法匹配,请注意不需要的匹配: R[0-9]{4}将匹配R1234中的CSR123456 (解决方案根据您的实际需求而有所不同,通常字边界\\b足够,或负面外观可以使用)
  • Inside a match block, the regex matching function requires a capturing group to be present if you want to get some value back (you defined it as el in pPat(el) and rPat(el) . 里面一个match块,正则表达式匹配功能需要捕获基团存在,如果你想获得一些值回(你将它定义为elpPat(el)rPat(el)

So, I suggest the following solution : 所以,我建议以下解决方案

val rPat="""\b(R\d{4})\b""".r.unanchored
val pPat="""\b(P\.\d{2}\.\d{2}\.\d{2})\b""".r.unanchored

val matcher= (s:String) => s match {case pPat(el)=> println(el) // print the P.25.01.25
    case rPat(el)=>println(el) // print R0100 
    case _ => println("no match")
}

Then, 然后,

val pSt=" P.25.01.21 - Hello whats going on?"
matcher(pSt) // => P.25.01.21
val pSt2_bad=" CP.2334565.01124.212 - Hello whats going on?"
matcher(pSt2_bad) // => no match
val rSt= "R0010  test test 3,870" 
matcher(rSt) // => R0010
val rSt2_bad = "CSR00105  test test 3,870" 
matcher(rSt2_bad) // => no match

Some notes on the patterns: 关于模式的一些注意事项:

  • \\b - a leading word boundary \\b - 领先的单词边界
  • (R\\d{4}) - a capturing group matching exactly 4 digits (R\\d{4}) - 一个完全匹配4位数的捕获组
  • \\b - a trailing word boundary \\b - 尾随字边界

Due to the triple quotes used to define the string literal, there is no need to escape the backslashes. 由于用于定义字符串文字的三引号,因此无需转义反斜杠。

Introduce groups in your patterns: 在您的模式中引入组:

val rPat=".*([R]{1}[0-9]{4}).*".r

val pPat=".*([P]{1}[.]{1}[0-9]{2}[.]{1}[0-9]{2}[.]{1}[0-9]{2}).*".r 

...

scala> matcher(pSt)
P.25.01.21

scala> matcher(rSt)
R0010

If code is written in the following way, the desired outcome will be generated. 如果以下列方式编写代码,则将生成所需的结果。 Reference API documentation followed is http://www.scala-lang.org/api/2.12.1/scala/util/matching/Regex.html 随后的参考API文档是http://www.scala-lang.org/api/2.12.1/scala/util/matching/Regex.html

  //should match R0010, R0100,R0300 etc
  val rPat="[R]{1}[0-9]{4}".r
  // should match P.25.01.21 , P.27.03.25 etc
  val pPat="[P]{1}[.]{1}[0-9]{2}[.]{1}[0-9]{2}[.]{1}[0-9]{2}".r


  def main(args: Array[String]) {
    val pSt=" P.25.01.21 - Hello whats going on?"
    val pPatMatches = pPat.findAllIn(pSt);
    pPatMatches.foreach(println)
    val rSt= "R0010  test test 3,870"
    val rPatMatches = rPat.findAllIn(rSt);
    rPatMatches.foreach(println)

  }

Please, let me know if that works for you. 请告诉我这是否适合您。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM