简体   繁体   English

Scalas / Haskells解析器组合器是否足够?

[英]Is Scalas/Haskells parser combinators sufficient?

I'm wondering if Scalas/Haskells parser combinators are sufficient for parsing a programming language. 我想知道Scalas / Haskells解析器组合器是否足以解析编程语言。 More specifically the language MiniJava. 更具体地说,语言MiniJava。 I'm currently reading compiller construction and jflex and java cup is quite painful to work with so I'm wondering if I could/should use parser combinators instead. 我正在阅读compiller构建和jflex和java杯是非常痛苦的工作所以我想知道我是否可以/应该使用解析器组合器。 The MiniJava syntax is very small. MiniJava语法非常小。 MiniJavas BNF: http://www.cambridge.org/us/features/052182060X/grammar.html MiniJavas BNF: http//www.cambridge.org/us/features/052182060X/grammar.html

I've never used Scala, but the existence of a definitive BNF makes this easy. 我从未使用过Scala,但是确定的BNF的存在使得这很容易。

Trivially translated into Haskell's Text.ParserCombinators.Parsec : 简单地翻译成Haskell的Text.ParserCombinators.Parsec

goal = do c <- mainClass
          cs <- many classDeclaration
          eof
          return $ c:cs
mainClass = do token "class"
               name <- identifier
               ...

etc. The PArrows translation is pretty trivial too. PArrows翻译也非常简单。 You'll probably find it easier to have a distinct lexing phase before the parser, but you can do without too. 你可能会发现在解析器之前有一个明显的lexing阶段会更容易,但你也可以不用。

我正在使用Scala的解析器组合来解析PL / SQL代码,它就像一个魅力。

At least Parsec has built-in lexer for Java-like languages: 至少Parsec有类似Java语言的内置词法分析器:

lexer = makeTokenParser javaStyle

You have to define the reserved words yourself. 你必须自己定义保留字。

Scala's parser is a backtracking parser, so it can deal with pretty much any BNF or EBNF. Scala的解析器是一个回溯解析器,所以它几乎可以处理任何BNF或EBNF。 It also means, though, that there are edge cases where input can be painfully slow to be read. 但是,这也意味着存在边缘情况,输入读取的速度很慢。

If the grammar can be changed into an LL(1) grammar , you can use the ~! 如果语法可以改成LL(1)语法 ,你可以使用〜! operator to keep backtracking to a minimum. 操作员将回溯保持在最低限度。

The grammar probably CAN be turned into LL(1), but, as written, it is not. 语法可能可以变成LL(1),但是,正如所写,它不是。 See, for instance, that Expression and Statement have First/First conflicts (look this up at the end of the linked article). 例如,请参阅Expression和Statement有First / First冲突(在链接文章的末尾查看)。

Anyway, for an academic project, it is enough. 无论如何,对于一个学术项目来说,这已经足够了。 For real life compiler stuff, you'll need faster parsers. 对于现实生活中的编译器,你需要更快的解析器。

Programming in Scala (p. 647) says: Scala编程 (p.647)说:

It [Scala's parser combinator framework] is much easier to understand and to adapt than a parser generator, and the difference in speed would often not matter in practice, unless you want to parse very large inputs. 它[Scala的解析器组合框架]比解析器生成器更容易理解和适应,并且速度的差异在实践中通常无关紧要,除非您想要解析非常大的输入。

As I would not classify source code as very large input (ideally), it should be sufficient. 因为我不会将源代码分类为非常大的输入 (理想情况下),所以它应该足够了。

我没有处理Scala或Haskell解析器组合库,但看起来语法应该没问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM