简体   繁体   中英

Scala Parser and combinators: java.lang.RuntimeException: string matching regex `\z' expected

I am trying to parse some text following a grammer for Dynamic Epistemic Logic using Scala's RegexParser, as part of my Master Thesis. But I keep getting the same error on simple logical conjunctions. I understand where and why it's failing, but not why it's matching what it is in the first place.

My Code (severely boiled down to isolate the problem):

import scala.util.parsing.combinator._

class Formula() {
      def and(q:Formula) = Conjunction(this, q) // ∧
}

abstract class Literal extends Formula
abstract class Constant extends Formula

case class Atom(symbol:String) extends Literal 
case class NotAtom(p:Atom) extends Literal

case class Conjunction(p:Formula, q:Formula) extends Formula

class mapParser extends RegexParsers {
    val conjOp = "&"
    val negOp = "~"

    val listseparator = ","

    val leftparen = "("
    val rightparen = ")"

    def id:Parser[String] = "[a-z_]+".r // fluents are never capitalized. but may have underscore.
    def litargs: Parser[String] = repsep("[a-zA-Z]+".r,listseparator) ^^ {case list => "(" + list.toString.stripPrefix("List") + ")"}

    def atom: Parser[Atom] = id~leftparen~litargs~rightparen ^^ {case head~_~tail~_ => Atom(head+tail)}
    def negAtom: Parser[NotAtom] = negOp~>atom ^^ (NotAtom(_))
    def literal: Parser[Literal] = negAtom | atom 

    def and: Parser[Formula] = formula~conjOp~formula ^^ {case p1~_~p2 => Conjunction(p1,p2)}  

    def formula: Parser[Formula] = literal | and
};

object DomainParser extends mapParser {
  def test() =  {
    val domainDesc ="present(A) & ~present(B)";

    println("input: " + domainDesc)
    println("result: " + apply(domainDesc))
  }

  def apply(domainDesc: String) = parseAll(formula, domainDesc) match {
    case Success(result, _) => result
    case failure : NoSuccess => scala.sys.error(failure.msg)
  }
}

I am calling the DomainParser.test() function externally from java. The input is

present(A) & ~present(B)

which should yield:

Conjunction(Atom(present((A))),NotAtom(Atom(present((B)))))

but instead gives me the error:

Exception in thread "main" java.lang.RuntimeException: string matching regex `\z' expected but `&' found
    at scala.sys.package$.error(package.scala:27)
    at mAp.DomainParser$.apply(DEL.scala:48)
    at mAp.DomainParser$.test(DEL.scala:43)
    at mAp.DomainParser.test(DEL.scala)
at ma.MA.main(MA.java:8)

Furthermore, if I call the 'and' parser directly instead of the 'formula' parser, it works fine. Hence the problem seems to be with this line:

def formula: Parser[Formula] = literal | and

Because it attempts to parse the whole line as a single literal. It then parses present(A) correctly, but instead of failing on the '&' (not part of literal's parser) and returning to parse as an 'and'-term, it fails with the exception.

I cannot for the love of... see why it tries to match any '\\z' at all. It is not included in the grammar by me, and even if it was - shouldn't it fail and try to parse as the next term instead of exiting with an exception? I am torn between thinking there is some in-built functionality for end-of-string terms that I do not know, to thinking there is something hugely obvious staring me in the face.

Any help would be sorely needed, very welcome and thank you very much in advance.

Dan True

I'll just add a similar Parser for propositional formulas I made. Maybe this might help you.

'+' = top/true

'-' = bottom/false

'!' = negation

'&' = conjunction

'|' = disjunction

'>' = implication

'<' = equivalence

object FormulaParser extends StandardTokenParsers with PackratParsers {
  //Symbols for all connectives
  private val parseSymbols = List("(", ")", "+", "-", "!", "&", "|", ">", "<")
  lexical.delimiters ++= parseSymbols

  private lazy val formula: PackratParser[Formula] = implication | equivalence | conjunction | disjunction | term
  private lazy val formulaWithoutBrackets: PackratParser[Formula] = implication | equivalence | conjunction | disjunction | termWithoutBrackets

  private lazy val term: PackratParser[Formula] = top | bottom | variable | parens | negation
  private lazy val termWithoutBrackets = top | bottom | variable | negation

  private lazy val top: PackratParser[Formula] = "+" ^^^ { Top() }
  private lazy val bottom: PackratParser[Formula] = "-" ^^^ { Bottom() }
  private lazy val variable: PackratParser[Formula] = ident ^^ { Variable(_) }
  private lazy val parens: PackratParser[Formula] = "(" ~> formulaWithoutBrackets <~ ")"
  private lazy val negation: PackratParser[Formula] = "!" ~> term ^^ { Negation(_) }

  private lazy val conjunction: PackratParser[Formula] = term ~ "&" ~ term ~ rep("&" ~> term) ^^ {
    case p ~ "&" ~ q ~ conj => conj.foldLeft(Conjunction(p,q))((con, elem) => Conjunction(con, elem))
  }

  private lazy val disjunction: PackratParser[Formula] = term ~ "|" ~ term ~ rep("|" ~> term) ^^ {
    case p ~ "|" ~ q ~ disj => disj.foldLeft(Disjunction(p,q))((dis, elem) => Disjunction(dis, elem))
  }

  private lazy val implication: PackratParser[Formula] = (conjunction | disjunction | term) ~ ">" ~ (conjunction | disjunction | term) ^^ { case p ~ ">" ~ q => Implication(p, q) }

  private lazy val equivalence: PackratParser[Formula] = (conjunction | disjunction | term) ~ "<" ~ (conjunction | disjunction | term) ^^ { case p ~ "<" ~ q => Equivalence(p, q) }
}

With this you can parse input like: (p & q) | (!q > (r & s)) (p & q) | (!q > (r & s))

Here Conjunction and Disjunction also bind stronger than Implication and Equivalence.

p & q > r | s p & q > r | s will result in Implication(Conjunction(Variable(p), Variable(q)), Disjunction(Variable(r), Variable(s)))

Ok. If it's just due to the left-recursion I have a similar parser, where I resolved that.

You have to change the following:

def and: Parser[Formula] = literal~conjOp~literal~rep(conjOp ~> literal) ^^ {
  case p ~ conjOp ~ q ~ conj => conj.foldLeft(Conjunction(p,q))(Conjunction(_, _))
}
def formula: Parser[Formula] = and | literal

Since in the end there are only Literals connected via Conjunction , you can rewrite and like that.

Slightly more complex example with your code:

input: p(A,B) & ~p(B) & p(C)
result: Conjunction(Conjunction(Atom(p(A,B)),NotAtom(Atom(p(B)))),Atom(p(C)))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM