简体   繁体   English

解析Haskell自定义数据类型

[英]Parsing Haskell custom data types

I have worked my way through the Haskell Koans provided here: https://github.com/roman/HaskellKoans 我已经通过这里提供的Haskell Koans工作: https//github.com/roman/HaskellKoans

I am stuck on the last two Koans, both involving parsing custom algebraic data types. 我被困在最后两个Koans上,都涉及解析自定义代数数据类型。 Here is the first: 这是第一个:

data Atom = AInt Int | ASym Text deriving (Eq, Show)

testAtomParser :: Test
testAtomParser = testCase "atom parser" $ do
    -- Change parser with the correct parser to use
    --
    let parser = <PARSER HERE> :: P.Parser Atom
    assertParse (ASym "ab") $ P.parseOnly parser "ab"
    assertParse (ASym "a/b") $ P.parseOnly parser "a/b"
    assertParse (ASym "a/b") $ P.parseOnly parser "a/b c"
    assertParse (AInt 54321) $ P.parseOnly parser "54321"

How can define the variable parser such that it can parse the algebraic datatype Atom to pass the assertions? 如何定义变量解析器,以便它可以解析代数数据类型Atom来传递断言?

I. 一世。

Parsers of an ADT tend to reflect the shape of the ADT. ADT的解析器倾向于反映ADT的形状。 Your ADT is formed of two disjoint parts, so your parser probably has two disjoint parts as well 你的ADT是由两个不相交的部分组成的,所以你的解析器也可能有两个不相交的部分

atom = _ <|> _

II. II。

Assuming we know how to parse a single digit (let's call that basic parser digit ) then we parse a (non-negative) integer by just repeating it. 假设我们知道如何解析单个数字(让我们称之为基本解析器digit ),那么我们通过重复它来解析(非负)整数。

natural = let loop = digit >> loop in loop

this successfully parses an infinite stream of digits and throws them away. 这成功地解析了无限的数字流并将它们抛弃。 Can we do better? 我们可以做得更好吗? Not with just a monad instance, unfortunately, we need another basic combinator, many , which modifies some other parser to consume input 0 or more times, accumulating the results into a list. 不幸的是,不仅仅是一个monad实例,我们需要另一个基本的组合器, many ,它修改了一些其他解析器以消耗输入0次或更多次,将结果累积到列表中。 We'll actually adjust this slightly since an empty parse isn't a valid number 我们实际上会稍微调整一下,因为空解析不是有效数字

many1 p = do x  <- p
             xs <- many p
             return (x:xs)

natural' = many1 digit

III. III。

What about atoms? 原子怎么样? To pass the test cases, it appears that an atom must be 1-to-many alphanumeric characters or backslashes. 为了传递测试用例,似乎原子必须是1对多的字母数字字符反斜杠。 Again, this disjoint structure can be immediately expressed in our parser 同样,这个不相交的结构可以立即在我们的解析器中表达

sym = many1 (_ <|> _)

We'll again use some built-in simple parser combinators to build up what we want, say satisfy :: (Char -> Bool) -> Parser Char which matches any character which satisfies some predicate. 我们将再次使用一些内置的简单解析器组合来构建我们想要的东西,比如satisfy :: (Char -> Bool) -> Parser Char匹配任何满足某些谓词的字符。 We can immediately build another useful combinator, char c = satisfy (==c) :: Char -> Parser Char and then we're done. 我们可以立即构建另一个有用的组合器, char c = satisfy (==c) :: Char -> Parser Char然后我们就完成了。

sym = many1 (char '/' <|> satisfy isAlpha)

where isAlpha is a predicate much like the regex [a-zA-Z] . 其中isAlpha是一个谓词,就像正则表达式[a-zA-Z]

IV. IV。

So now we have the core of our parser 所以现在我们有了解析器的核心

natural <|> sym :: Parser String

the many1 combinators lift our character parsers into parsers of lists of characters ( String s!). many1组合器将我们的字符解析器提升为字符列表的解析器( String s!)。 This lifting action is the basic idea for building ADT parsers, too. 这个提升动作也是构建ADT解析器的基本思路。 We want to lift our Parser String up into Parser Atom . 我们想把我们的Parser String升级为Parser Atom One way to do it would be to use a function toAtom :: String -> Atom which we could then fmap into the Parser 一种方法是使用函数toAtom :: String -> Atom然后我们可以将其fmapParser

atom' :: Parser Atom
atom' = fmap toAtom (natural <|> sym)

but a function with type String -> Atom defeats the purpose of building a parser in the first place. 但是类型为String -> Atom的函数首先会破坏构建解析器的目的。

As stated in I. the important part is that the shape of the ADT is reflected in the shape of our atom parser. 如I.中所述,重要的部分是ADT的形状反映在我们的atom解析器的形状中。 We'll need to take advantage of that to build our final parser. 我们需要利用它来构建我们的最终解析器。

V. V.

We need to take advantage of information in the structure of our atom parser. 我们需要利用atom解析器结构中的信息。 Let's instead build two functions 让我们构建两个函数

liftInt :: String -> Atom  -- creates `AInt`s
liftSym :: String -> Atom  -- creates `ASym`s

liftInt = AInt . read
liftSym = ASym

each of which stating both a method of turning String s into Atom s but also declaring what kind of Atom we're dealing with. 每一个都说明了将String转换为Atom的方法,同时也声明了我们正在处理的是什么类型Atom It's worth noting that liftInt will throw a runtime error if we pass it a string that cannot be parsed into an Int . 值得注意的是,如果我们传递一个无法解析为Int的字符串, liftInt将抛出运行时错误。 Fortunately, that's exactly what we know we have. 幸运的是,这正是我们所知道的。

atomInt :: Parser Atom
atomInt = liftInt <$> natural

atomSym :: Parser Sym
atomSym = liftSym <$> sym

atom'' = atomInt <|> atomSym

Now our atom'' parser takes advantage of the guarantee that natural will only return strings which are valid parses for a natural---our call to read will not fail!---and we try to build both AInt and ASym in order, trying one after another in a disjoint structure just like the structure of our ADT. 现在我们的atom''解析器利用了natural只返回有效解析的字符串的保证 - 我们对read的调用不会失败!---我们尝试按顺序构建AIntASym ,像ADT的结构一样,在一个不相交的结构中一个接一个地尝试。

VI. VI。

The whole shebang is thus 整个社会就是这样

atom''' =     AInt . read <$> many1 digit
          <|> ASym <$> many1 (    char '/' 
                              <|> satisfy isAlpha)

which shows the fun of parser combinators. 这展示了解析器组合器的乐趣。 The whole thing is built up from the ground using tiny, composable, simple pieces. 整个事物是使用小巧,可组合的简单部件从地面构建的。 Each one does a very tiny job but all together they span a large space of parsers. 每个人都做了一个非常小的工作,但他们一起跨越了大量的解析器。

You can also easily augment this grammar with more branches in your ADT, a more thoroughly specified symbol type parser, or failure decorations with <?> so that you have great error messages on failed parses. 您还可以使用ADT中的更多分支,更完整指定的符号类型解析器或使用<?>故障装饰轻松扩充此语法,以便在失败的分析中包含大量错误消息。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM