简体   繁体   English

解析器组合器的类型

[英]Types for parser combinators

If I have a parser a : Parser A and a parser b : Parser B then I can combine it into a parser a | b : Parser (Either AB) 如果我有一个解析器a : Parser A和解析器b : Parser B然后我可以将它组合成一个解析器a | b : Parser (Either AB) a | b : Parser (Either AB) . a | b : Parser (Either AB) This works but gets a little tricky when you start adding more alternatives and getting types like Either A (Either BC) . 当你开始添加更多的替代品并获得像Either A (Either BC)这样的类型时,这会有点麻烦。 I can imagine flattening the previous type into something like Alternative ABC . 我可以想象将之前的类型扁平化为Alternative ABC Is there a standard transformation I can perform or am I stuck with generating a whole bunch of boilerplate for types like Alternative ABC ... . 是否有我可以执行的标准转换,或者我坚持为Alternative ABC ...类型生成一大堆样板文件。

So the interesting thing about Either is that you can use it as a type-level cons operator. 所以关于Either的有趣之处在于你可以将它用作类型级的cons运算符。

A `Either` (B `Either` (C `Either` (D `Either` Void))) --> [A,B,C,D]

So all we need do is make that explicit. 所以我们需要做的就是明确这一点。 You'll need ghc-7.8 to support closed data families: 你需要ghc-7.8来支持封闭的数据系列:

{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE TypeOperators #-}
{-# LANGUAGE DataKinds #-}
-- ...

type family OneOf (as :: [*]) :: * where
  OneOf '[a] = a
  OneOf (a ': as) = Either a (OneOf as)

Now you can write your types much more succinctly: 现在,您可以更简洁地编写类型:

aorborc :: Parser (OneOf '[A, B, C])
aorborc = a | (b | c)

It's still Either under the hood, so you can still easily interoperate with all existing code that uses Either , which is nice. 它仍然Either引擎盖下,所以你仍然可以轻松地与使用所有现有的代码进行互操作Either ,这是很好的。

Either is just one possible sum type in Haskell, and because of the ready made class instances and helper functions is useful for many cases, but becomes considerably clunkier when you nest it. 要么只是Haskell中的一种可能的和类型,并且由于现成的类实例和辅助函数在许多情况下都很有用,但是当你嵌套它时变得相当笨拙。

The best approach for a parser is to create your own data type that mirrors the structure you're parsing and parse directly into that. 解析器的最佳方法是创建自己的数据类型,该数据类型反映您正在解析的结构并直接解析为该结构。 Let's make a partial toy example about a toy language. 让我们举一个关于玩具语言的部分玩具示例。

data Statement = TypeDec String Type
                 DataDec String [Constructor]
                 FunctionDec String LambdaExpression

statement :: Parser Statement
statement = TypeDec <$> string "type " *> identifier <*> string " = " *> type
            <|> DataDec <$> string "data " *> identifier <*> string " = " *> many constructor
            <|> FunctionDec <$> identifier <*> string " = " *> lambdaExpression

In this way, both your data structure and your code mirror the productions in the grammar you're parsing. 通过这种方式,您的数据结构和代码都会镜像您正在解析的语法中的产品。 The great benefit to that is that your data is type safe, clear and ready to use as soon as it's parsed. 这样做的最大好处是您的数据类型安全,清晰,并且可以在解析后立即使用。

(I can never remember the fixities of *> and <* , so I've probably done it the way you need brackets or something, but hopefully you get the idea.) (我永远不会记住*><*的固定性,所以我可能就像你需要括号或其他东西一样,但希望你能得到这个想法。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM