在Haskell中使用lex解析字符串

Question

我正在关注Haskell教程的简要介绍，并且那里提供的代码似乎已经破解。 我需要了解它是否如此，或者我对这个概念的看法是错误的。

我正在为自定义类型实现解析器：

data Tree a = Leaf a | Branch (Tree a) (Tree a)

打印功能方便

showsTree              :: Show a => Tree a -> String -> String
showsTree (Leaf x)     = shows x
showsTree (Branch l r) = ('<':) . showsTree l . ('|':) . showsTree r . ('>':)

instance Show a => Show (Tree a) where 
    showsPrec _ x = showsTree x

这个解析器很好，但是当有空格时就会中断

readsTree         :: (Read a) => String -> [(Tree a, String)]
readsTree ('<':s) =  [(Branch l r, u) | (l, '|':t) <- readsTree s,
                                        (r, '>':u) <- readsTree t ]
readsTree s       =  [(Leaf x, t)     | (x,t)      <- reads s]

这个被认为是一个更好的解决方案，但没有空间它不起作用

readsTree_lex    :: (Read a) => String -> [(Tree a, String)]
readsTree_lex s  = [(Branch l r, x) | ("<", t) <- lex s,
                                   (l, u)   <- readsTree_lex t,
                                   ("|", v) <- lex u,
                                   (r, w)   <- readsTree_lex v,
                                   (">", x) <- lex w ]
                ++
                [(Leaf x, t)     | (x, t)   <- reads s ]

接下来，我选择一个解析器与read一起使用

instance Read a => Read (Tree a) where
    readsPrec _ s = readsTree s

然后我使用Leksah调试模式在ghci中加载它（这是不相关的，我猜），并尝试解析两个字符串：

    read "<1|<2|3>>"   :: Tree Int -- succeeds with readsTree
    read "<1| <2|3> >" :: Tree Int -- succeeds with readsTree_lex

当lex遇到|<2...前一个字符串的一部分时，它会分裂到("|<", _) 。 那不匹配("|", v) <- lex u解析器的一部分，无法完成解析。

有两个问题出现：

如何定义真正忽略空格的解析器，而不是需要它们？
如何定义用于使用lex拆分遇到的文字的规则

谈到第二个问题 - 人们更多地要求他们好奇，因为定义我自己的词法分析器似乎比定义现有词法分析器的规则更正确。

Answer 1

lex分裂成Haskell lexemes，跳过空白。

这意味着，由于Haskell允许|<作为一个词汇， lex不会将它分成两个lexemes，因为这不是它在Haskell中解析的方式。

如果您对Haskell使用相同（或类似）的语法规则，则只能在解析器中使用lex 。

如果要忽略所有空格（而不是使任何空格等效于一个空格），首先运行filter (not.isSpace)会更简单，更有效。

Answer 2

对此的回答似乎是对Haskell及其代码示例的简洁介绍文本与示例代码中的错误之间的小差距。

还应该有一个词法分析器，但在代码库中没有工作示例（满足我的需要），所以我写了一个。 请指出其中的任何缺陷：

lexAll :: ReadS String
lexAll s = case lex s of
            [("",_)] -> []                                  -- nothing to parse.
            [(c, r)] -> if length c == 1 then [(c, r)]      -- we will try to match
                           else [(c, r), ([head s], tail s)]-- not only as it was 
            any_else -> any_else                            -- parsed but also splitted

作者sais：

最后，完整的读者。 这与以前的版本一样对白色空间不敏感。 当您为数据类型派生Show类时，自动生成的阅读器与样式类似。

但是应该使用lexAll而不是lex（这似乎是错误）：

readsTree' :: (Read a) => ReadS (Tree a)
readsTree' s = [(Branch l r, x) | ("<", t) <- lexAll s,
                  (l, u)   <- readsTree' t,
                                  ("|", v) <- lexAll u,
                                  (r, w)   <- readsTree' v,
                  (">", x) <- lexAll w ]
                ++
                [(Leaf x, t)    | (x, t) <- reads s]

在Haskell中使用lex解析字符串

问题描述

2 个解决方案

解决方案1
4 已采纳 2014-01-16 12:19:48

解决方案2
2 2014-01-16 12:32:14

在Haskell中使用lex解析字符串

问题描述

2 个解决方案

解决方案1 4 已采纳 2014-01-16 12:19:48

解决方案2 2 2014-01-16 12:32:14

解决方案1
4 已采纳 2014-01-16 12:19:48

解决方案2
2 2014-01-16 12:32:14