简体   繁体   English

IP地址的Scala正则表达式模式匹配

[英]Scala regex pattern match of ip address

I can't understand why this code returns false: 我不明白为什么这段代码返回false:

      val reg = """.*(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3}).*""".r
      "ttt20.30.4.140ttt" match{
        case reg(one, two, three, four) =>
          if (host == one + "." + two + "." + three + "." + four) true else false
        case _ => false
      }

and only if I change it to: 并且仅当我将其更改为:

  val reg = """.*(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3}).*""".r
  "20.30.4.140" match{
    case reg(one, two, three, four) =>
      if (host == one + "." + two + "." + three + "." + four) true else false
    case _ => false
  }

it does match 它确实匹配

Your variant 您的变体

def main( args: Array[String] ) : Unit = {
  val regex = """.*(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3}).*""".r
  val x = "ttt20.30.4.140ttt"

  x match {
    case regex(ip1,ip2,ip3,ip4) => println(ip1, ip2, ip3, ip4)
    case _ => println("No match.")
  }
}

matches, but not as you intend. 匹配,但不符合您的预期。 Result will be (0,30,4,140) instead of ( 20 ,30,4,140). 结果将是(0,30,4,140)代替(20,30,4,140)。 As you can see .* is greedy, so consumes as much input as it can. 如您所见.*是贪婪的,因此会消耗尽可能多的输入。

eg ab12 could be separated via .*(\\d{1,3}) into 例如ab12可以通过.*(\\d{1,3})分隔为

  • ab and 12 ab12
  • ab1 and 2 .... this is the variant chosen, as .* consumes as much input as it can ab12 ....这是选择的变体,因为.*会消耗尽可能多的输入

Solutions 解决方案

  1. Make .* reluctant (and not greedy), that is .*? 使.*不愿意(而不是贪婪),也就是.*? so in total 所以总共

     """.*?(\\d{1,3})\\.(\\d{1,3})\\.(\\d{1,3})\\.(\\d{1,3}).*""".r 
  2. Precisely define the pattern before the first number, eg if these are only characters, do 精确定义第一个数字之前的模式,例如,如果这些仅是字符,请执行

     """[a-zA-Z]*(\\d{1,3})\\.(\\d{1,3})\\.(\\d{1,3})\\.(\\d{1,3}).*""".r 

您应该使用勉强的量词而不是贪婪的量词

val reg = """.*?(\d{1,3})\.(\d{1,3})\.(\d{1,3})\.(\d{1,3}).*""".r

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM