简体   繁体   English

如何在 Scala 解析器组合器中组合 Regexp 和关键字

[英]How to combine Regexp and keywords in Scala parser combinators

I've seen two approaches to building parsers in Scala.我见过两种在 Scala 中构建解析器的方法。

The first is to extends from RegexParsers and define your won lexical patterns.第一个是从 RegexParsers 扩展并定义您赢得的词法模式。 The issue I see with this is that I don't really understand how it deals with keyword ambiguities.我看到的问题是我真的不明白它是如何处理关键字歧义的。 For example, if my keyword match the same pattern as idents, then it processes the keywords as idents.例如,如果我的关键字与 idents 匹配相同的模式,那么它会将关键字作为 idents 处理。

To counter that, I've seen posts like this one that show how to use the StandardTokenParsers to specify keywords.为了解决这个问题,我看过类似这样的帖子, 其中展示了如何使用 StandardTokenParsers 来指定关键字。 But then, I don't understand how to specify the regexp patterns!但是,我不明白如何指定正则表达式模式! Yes, StandardTokenParsers comes with "ident" but it doesn't come with the other ones I need (complex floating point number representations, specific string literal patterns and rules for escaping, etc).是的,StandardTokenParsers 带有“ident”,但它没有与我需要的其他那些(复杂的浮点数表示、特定的字符串文字模式和转义规则等)一起提供。

How do you get both the ability to specify keywords and the ability to specify token patterns with regular expressions?您如何获得指定关键字的能力和使用正则表达式指定令牌模式的能力?

I've written only RegexParsers -derived parsers, but what I do is something like this: 我只写了RegexParsers派生的解析器,但我做的是这样的:

val name: Parser[String] = "[A-Z_a-z][A-Z_a-z0-9]*".r

val kwIf: Parser[String]    = "if\\b".r
val kwFor: Parser[String]   = "for\\b".r
val kwWhile: Parser[String] = "while\\b".r

val reserved: Parser[String] = ( kwIf | kwFor | kwWhile )

val identifier: Parser[String] = not(reserved) ~> name

Similar to the answer from @randall-schulz, but use an explicit negative lookahead in the regular expression itself.类似于@randall-schulz 的答案,但在正则表达式本身中使用显式否定前瞻。

Here, empty is a keyword but empty?在这里, empty是一个关键字,但empty? should be an identifier.应该是标识符。 The negative lookahead fails the match (without consuming the characters) if empty is followed by anything in nameCharsRE .如果空后跟nameCharsRE的任何内容,则否定前瞻匹配失败(不消耗字符)。 The kw helper function is used for multiple such keywords: kw辅助函数用于多个这样的关键字:

  val nameCharsRE = "[^\\s\",'`()\\[\\]{}|;#]"

  private def kw(kw: String, token: Token) = positioned {
    (s"${kw}(?!${nameCharsRE})").r ^^ { _ => token }
  }
  private def empty        = kw("empty", EMPTY_KW())
  private def and          = kw("and", AND())
  private def or           = kw("or", OR())

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM