简体   繁体   中英

Scala Regex for less than equal to operator (<=)

I am trying to parse an expression with (<, <=, >=, >). All but <= works just fine. Can someone help what could be the issue. Code:

object MyTestParser extends RegexParsers {
  override def skipWhitespace = true

  private val expression: Parser[String] = """[a-zA-Z0-9\.]+""".r

  val operation: Parser[Try[Boolean]] =
    expression ~ ("<" | "<=" | ">=" | ">") ~ expression ^^ {
      case v1 ~ op ~ v2 => for {
        a <- Try(v1.toDouble)
        b <- Try(v2.toDouble)
      } yield op match {
        case "<" => a < b
        case "<=" => a <= b
        case ">" => a > b
        case ">=" => a >= b
      }
  }
}

Test:

"MyTestParser" should {
    "successfully parse <= condition" in {
      val parser = MyTestParser.parseAll(MyTestParser.operation, "10 <= 20")
      val result = parser match {
        case MyTestParser.Success(s, _) => s.get
        case MyTestParser.Failure(e, _) =>
          println(s"Parsing failed with error: $e")
          false
        case MyTestParser.Error(e, _) =>
          println(s"Parsing error: $e")
          false
      }
      result === true
    }

    "successfully parse >= condition" in {
      val result = MyTestParser.parseAll(MyTestParser.operation, "50 >= 20").get
      result === scala.util.Success(true)
    }
  }

Error for <= condition:

Parsing failed with error: string matching regex `[a-zA-Z0-9\.]+' expected but `=' found

You need to change the order of the alternatives so that the longest options could be checked first.

expression ~ ( "<=" | ">=" | ">" | "<") ~ expression ^^ {

If the shortest alternative matches first, others are not considered at all.

Also note that a period does not have to be escaped inside a character class, this will do:

"""[a-zA-Z0-9.]+""".r

Your problem is that "<" is matched by <=, so it moves on to trying the expression. If you change the order so that "<=" comes first, that will be matched instead, and you will get the desired result.

@Prateek: it does not work cause the regex engine works just like a boolean OR. It does not search further if one of the patterns in the or-chain is satisfied at a certain point.

So, when use | between patterns, if two or more patterns have substring in common , you have to place the longest first .

As a general rule : order the patterns starting from the longest to the shortest .

Change the relevant line like this make it works:

 // It works as expected with '>= / >' also before for the same reason
 expression ~ ("<=" | "<" | ">=" | ">") ~ expression ^^ {

Or you want to follow the general rule :

 expression ~ ("<=" | ">=" | "<" | ">") ~ expression ^^ {

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM