简体   繁体   English

Haskell 中的广义自底向上解析器组合器

[英]Generalized Bottom up Parser Combinators in Haskell

I am wondered why there is no generalized parser combinators for Bottom-up parsing in Haskell like a Parsec combinators for top down parsing.我想知道为什么在 Haskell 中没有用于自下而上解析的通用解析器组合器,如用于自上而下解析的 Parsec 组合器。
( I could find some research work went during 2004 but nothing after (我可以找到一些研究工作在 2004 年进行,但之后就什么也没有了
https://haskell-functional-parsing.googlecode.com/files/Ljunglof-2002a.pdf http://www.di.ubi.pt/~jpf/Site/Publications_files/technicalReport.pdf ) https://haskell-functional-parsing.googlecode.com/files/Ljunglof-2002a.pdf http://www.di.ubi.pt/~jpf/Site/Publications_files/technicalReport.pdf )

Is there any specific reason for not achieving it?没有实现它有什么具体原因吗?

This is because of referential transparency.这是因为引用透明。 Just as no function can tell the difference between正如没有函数可以区分

let x = 1:x
let x = 1:1:1:x
let x = 1:1:1:1:1:1:1:1:1:...  -- if this were writeable

no function can tell the difference between a grammar which is a finite graph and a grammar which is an infinite tree.没有函数可以区分有限图文法和无限树文法之间的区别。 Bottom-up parsing algorithms need to be able to see the grammar as a graph, in order to enumerate all the possible parsing states.自下而上的解析算法需要能够将语法视为图,以便枚举所有可能的解析状态。

The fact that top-down parsers see their input as infinite trees allows them to be more powerful, since the tree could be computationally more complex than any graph could be;自上而下的解析器将其输入视为无限树这一事实使它们变得更强大,因为树在计算上可能比任何图都复杂; for example,例如,

numSequence n = string (show n) *> option () (numSequence (n+1))

accepts any finite ascending sequence of numbers starting at n .接受从n开始的任何有限升序数字序列。 This has infinitely many different parsing states.这有无限多种不同的解析状态。 (It might be possible to represent this in a context-free way, but it would be tricky and require more understanding of the code than a parsing library is capable of, I think) (有可能以一种上下文无关的方式来表示这一点,但我认为这会很棘手,并且需要比解析库更多地理解代码,我认为)

A bottom up combinator library could be written, though it is a bit ugly, by requiring all parsers to be "labelled" in such a way that自下而上的组合子库可以写,但它是一个有点难看,要求所有解析器这样的方式被“贴上”

  • the same label always refers to the same parser, and相同的标签总是指向相同的解析器,并且
  • there is only a finite set of labels只有一组有限的标签

at which point it begins to look a lot more like a traditional specification of a grammar than a combinatory specification.在这一点上,它开始看起来更像是传统的文法规范而不是组合规范。 However, it could still be nice;但是,它仍然可以很好; you would only have to label recursive productions, which would rule out any infinitely-large rules such as numSequence .您只需要标记递归产生式,这将排除任何无限大的规则,例如numSequence

As luqui's answer indicates a bottom-up parser combinator library is not a realistic.由于 luqui 的回答表明自下而上的解析器组合器库是不现实的。 On the chance that someone gets to this page just looking for haskell's bottom up parser generator , what you are looking for is called the Happy parser generator .如果有人访问此页面只是寻找 haskell 的自底向上解析器生成器,那么您正在寻找的内容称为Happy parser generator It is like yacc for haskell.这就像haskell的yacc

As luqui said above: Haskell's treatment of recursive parser definitions does not permit the definition of bottom-up parsing libraries.正如 luqui 上面所说:Haskell 对递归解析器定义的处理不允许自底向上解析库的定义。 Bottom-up parsing libraries are possible though if you represent recursive grammars differently.但是,如果您以不同的方式表示递归语法,则可以使用自下而上的解析库。 With apologies for the self-promotion, one (research) parser library that uses such an approach is grammar-combinators .为自我宣传道歉,使用这种方法的一个(研究)解析器库是语法组合器 It implements a grammar transformation called the uniform Paull transformation that can be combined with the top-down parser algorithm to obtain a bottom-up parser for the original grammar.它实现了一种称为统一 Paull 变换的文法变换,该变换可以与自顶向下的解析器算法相结合,以获得原始文法的自底向上的解析器。

@luqui essentially says, that there are cases in which sharing is unobservable. @luqui 本质上说,在某些情况下无法观察到共享。 However, it's not the case in general: many approaches to observable sharing exist.然而,一般情况并非如此:存在许多可观察共享的方法。 Eg http://www.ittc.ku.edu/~andygill/papers/reifyGraph.pdf mentions a few different methods to achieve observable sharing and proposes its own new method:例如http://www.ittc.ku.edu/~andygill/papers/reifyGraph.pdf提到了几种不同的方法来实现可观察共享,并提出了自己的新方法:

This looping structure can be used for interpretation, but not for further analysis, pretty printing, or general processing.这种循环结构可用于解释,但不能用于进一步分析、漂亮打印或一般处理。 The challenge here, and the subject of this paper, is how to allow trees extracted from Haskell hosted deep DSLs to have observable back-edges, or more generally, observable sharing.这里的挑战,也是本文的主题,是如何允许从 Haskell 托管的深度 DSL 中提取的树具有可观察的后端,或者更一般地说,可观察的共享。 This a well-understood problem, with a number of standard solutions.这是一个很好理解的问题,有许多标准的解决方案。

Note that the "ugly" solution of @liqui is mentioned by the paper under the name of explicit labels .请注意,论文中以显式标签的名义提到了@liqui 的“丑陋”解决方案。 The solution proposed by the paper is still "ugly" as it uses so called "stable names", but other solutions such as http://www.cs.utexas.edu/~wcook/Drafts/2012/graphs.pdf (which relies on PHOAS) may work.该论文提出的解决方案仍然“丑陋”,因为它使用了所谓的“稳定名称”,但其他解决方案如http://www.cs.utexas.edu/~wcook/Drafts/2012/graphs.pdf (其中依赖 PHOAS)可能有效。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM