简体   繁体   English

使用解析器组合器解析 Haskell 本身

[英]Parse Haskell itself with parser combinators

Given parser combinators as defined by libraries such as Parsec, Attoparsec or various other functional implementations, is it possible to parse languages such as C or Haskell themselves?给定由 Parsec、Attoparsec 或各种其他功能实现等库定义的解析器组合器,是否可以自己解析 C 或 Haskell 等语言?

Here is an example of what I have in mind:这是我想到的一个例子:

-- constructor defined by its name, and a list of arguments           
data Constructor = Constructor String [Type]

-- type definition contains a type name, list of type variables, and a list of constructors
data Type = Type String [Char] [Constructor] 

In this very simplified example, parsing of a type could be:在这个非常简化的示例中,类型的解析可以是:

typeParser :: Parser Type
typeParser = do
  string "data"
  spaces
  name <- takeWhile letters
  spaces
  typeVars <- many1 letter
  ...

I noticed that the package http://hackage.haskell.org/package/haskell-src-1.0.3.1 parses the Haskell 98 language, but it does not depend on any of the parser combinator libraries. I noticed that the package http://hackage.haskell.org/package/haskell-src-1.0.3.1 parses the Haskell 98 language, but it does not depend on any of the parser combinator libraries.

TL;DR Yes, you can parse Haskell using a monadic parser combinator library like Parsec. TL;DR 是的,您可以使用像 Parsec 这样的单子解析器组合库来解析 Haskell。

Some programming languages like Haskell are not fully context-free.一些编程语言,如 Haskell 并不是完全上下文无关的。 This means that some contextual information is needed in order to parse them.这意味着需要一些上下文信息才能解析它们。 Haskell is not fully context-free because it is indentation-sensitive. Haskell 不是完全上下文无关的,因为它对缩进敏感。

Some monadic parser combinator libraries like Parsec and Megaparsec allow for more easily parsing context-sensitive languages.一些 monadic 解析器组合库,如 Parsec 和 Megaparsec 允许更轻松地解析上下文相关的语言。 Parsec's ParsecT and Parsec types can keep track of contextual information, which the library refers to as "user state", which allows for parsing the context-sensitive parts of languages like indentation level. Parsec 的ParsecTParsec类型可以跟踪上下文信息,库将其称为“用户状态”,它允许解析语言的上下文敏感部分,如缩进级别。 The "user state" can be accessed through the getState , putState , and modifyState functions. “用户状态”可以通过getStateputStatemodifyState函数访问。 The tricky part is mixing parsers that have "user states" of different types (although I am currently developing a fork of Parsec that makes it easier to do this among other things).棘手的部分是混合具有不同类型的“用户状态”的解析器(尽管我目前正在开发 Parsec 的一个分支,以便更容易做到这一点)。

It is possible to use approaches other than monadic parser combinators, however they are often more limited and/or less straightforward and can require more workarounds to get them working.可以使用单子解析器组合器以外的方法,但是它们通常更受限制和/或不太直接,并且可能需要更多变通方法才能使它们工作。 For example, a parser generator library like Flex/Bison could be used to parse the context-free parts of Haskell, however a workaround would be required to parse the context-sensitive parts because parser generator libraries can only parse context-free languages.例如,像 Flex/Bison 这样的解析器生成器库可用于解析 Haskell 的上下文无关部分,但需要解决方法来解析上下文相关部分,因为解析器生成器库只能解析上下文无关语言。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM