简体   繁体   English

在Haskell中为左关联树实现`read`

[英]Implementing `read` for a left-associative tree in Haskell

I'm having a hard time implementing Read for a tree structure. 我很难为树结构实现Read I want to take a left-associative string (with parens) like ABC(DE)F and convert it into a tree. 我想采用像ABC(DE)F这样的左关联字符串(带有parens)并将其转换为树。 That particular example corresponds to the tree 该特定示例对应于树

树 .

Here's the data type I'm using (though I'm open to suggestions): 这是我正在使用的数据类型(虽然我愿意接受建议):

data Tree = Branch Tree Tree | Leaf Char deriving (Eq)

That particular tree would be, in Haskell: 那个特定的树将在Haskell中:

example = Branch (Branch (Branch (Branch (Leaf 'A')
                                         (Leaf 'B'))
                                 (Leaf 'C'))
                         (Branch (Leaf 'D')
                                 (Leaf 'E')))
                 (Leaf 'F')

My show function looks like: 我的show函数看起来像:

instance Show Tree where
    show (Branch l r@(Branch _ _)) = show l ++ "(" ++ show r ++ ")"
    show (Branch l r) = show l ++ show r
    show (Leaf x) = [x]

I want to make a read function so that 我想制作一个read功能

read "ABC(DE)F" == example

This is a situation where using a parsing library makes the code amazingly short and extremely expressive. 在这种情况下,使用解析库会使代码非常短且极具表现力。 (I was amazed that it was so neat when I was experimenting to answer this!) (我很惊讶,这是如此整洁,当我尝试回答这个!)

I'm going to use Parsec (that article provides some links for more information), and using it in "applicative mode" (rather than monadic), since we don't need the extra power/foot-shooting-ability of monads. 我将使用Parsec (该文章提供一些链接以获取更多信息),并在“应用模式”(而不是monadic)中使用它,因为我们不需要monad的额外功率/足部射击能力。

Code

First the various imports and definitions: 首先是各种进口和定义:

import Text.Parsec

import Control.Applicative ((<*), (<$>))

data Tree = Branch Tree Tree | Leaf Char deriving (Eq, Show)

paren, tree, unit :: Parsec String st Tree

Now, the basic unit of the tree is either a single character (that's not a parenthesis) or a parenthesised tree. 现在,树的基本单元是单个字符(不是括号)或带括号的树。 The parenthesised tree is just a normal tree between ( and ) . 带括号的树只是()之间的普通树。 And a normal tree is just units put into branches left-associatedly (it's extremely self-recursive). 而正常的树只是左边相关的分支单元(它非常自我递归)。 In Haskell with Parsec: 在Haskell与Parsec:

-- parenthesised tree or `Leaf <character>`
unit = paren <|> (Leaf <$> noneOf "()") <?> "group or literal"

-- normal tree between ( and )
paren = between (char '(') (char ')') tree  

-- all the units connected up left-associatedly
tree = foldl1 Branch <$> many1 unit

-- attempt to parse the whole input (don't short-circuit on the first error)
onlyTree = tree <* eof

(Yes, that's the entire parser!) (是的,那就是整个解析器!)

If we wanted to, we could do without paren and unit but the code above is very expressive, so we can leave it as is. 如果我们想要,我们可以没有parenunit但上面的代码非常具有表现力,所以我们可以保持原样。

As a brief explanation (I've provided links to the documentation): 作为简要说明(我提供了文档的链接):

  • (<|>) basically means "left parser or right parser"; (<|>)基本上是指“左解析器或右解析器”;
  • (<?>) allows you to make nicer error messages; (<?>)允许您制作更好的错误消息;
  • noneOf will parse anything that's not in the given list of characters; noneOf将解析不在给定字符列表中的任何内容;
  • between takes three parsers, and returns the value of the third parser as long as it is delimited by the first and second ones; between需要三个解析器,并且只要它是由所述第一和第二个分隔返回第三个分析器的值;
  • char parses its argument literally. char从字面上解析其论点。
  • many1 parses one or more of its argument into a list (it appears that the empty string is invalid hence many1 , rather than many which parses zero or more); many1将一个或多个参数解析为一个列表(似乎空字符串无效,因此many1 ,而不是many解析零或更多);
  • eof matches the end of the input. eof匹配输入的结尾。

We can use the parse function to run the parser (it returns Either ParseError Tree , Left is an error and Right is a correct parse). 我们可以使用parse函数来运行解析器(它返回Either ParseError TreeLeft是一个错误, Right是一个正确的解析)。

As read 正如read

Using it as a read like function could be something like: 使用它作为read功能可能是这样的:

read' str = case parse onlyTree "" str of
   Right tr -> tr
   Left er -> error (show er)

(I've used read' to avoid conflicting with Prelude.read ; if you want a Read instance you'll have to do a bit more work to implement readPrec (or whatever is required) but it shouldn't be too hard with the actual parsing already complete.) (我使用read'来避免与Prelude.read发生冲突;如果你想要一个Read实例,你将需要做更多的工作来实现readPrec (或者任何需要的东西)但是它不应该太难了实际解析已经完成。)

Examples 例子

Some basic examples: 一些基本的例子:

*Tree> read' "A"
Leaf 'A'

*Tree> read' "AB"
Branch (Leaf 'A') (Leaf 'B')

*Tree> read' "ABC"
Branch (Branch (Leaf 'A') (Leaf 'B')) (Leaf 'C')

*Tree> read' "A(BC)"
Branch (Leaf 'A') (Branch (Leaf 'B') (Leaf 'C'))

*Tree> read' "ABC(DE)F" == example
True

*Tree> read' "ABC(DEF)" == example
False

*Tree> read' "ABCDEF" == example
False

Demonstrating errors: 证明错误:

*Tree> read' ""
***Exception: (line 1, column 1):
unexpected end of input
expecting group or literal

*Tree> read' "A(B"
***Exception: (line 1, column 4):
unexpected end of input
expecting group or literal or ")"

And finally, the difference between tree and onlyTree : 最后, treeonlyTree之间的区别:

*Tree> parse tree "" "AB)CD"     -- success: ignores ")CD"
Right (Branch (Leaf 'A') (Leaf 'B'))

*Tree> parse onlyTree "" "AB)CD" -- fail: can't parse the ")"
Left (line 1, column 3):
unexpected ')'
expecting group or literal or end of input

Conclusion 结论

Parsec is amazing! Parsec太神​​奇了! This answer might be long but the core of it is just 5 or 6 lines of code which do all the work. 这个答案可能很长,但它的核心只有5或6行代码完成所有工作。

This very-much looks like a stack structure. 这非常像堆栈结构。 When you encounter your input string "ABC(DE)F" , you Leaf any atom you find (non-parenthesis) and put it in an accumulator list. 当你遇到你的输入字符串"ABC(DE)F" ,你Leaf你找到(非括号),并把它放在一个蓄能器列表中的任何原子。 When you have 2 items in the list, you Branch them together. 如果列表中有2个项目,则将它们Branch在一起。 This could be done with something like (note, untested, just including to give an idea): 这可以用类似的东西来完成(注意,未经测试,仅包括给出一个想法):

read' [r,l] str  = read' [Branch l r] str
read' acc (c:cs) 
   -- read the inner parenthesis
   | c == '('  = let (result, rest) = read' [] cs 
                 in read' (result : acc) rest
   -- close parenthesis, return result, should be singleton
   | c == ')'  = (acc, cs) 
   -- otherwise, add a leaf
   | otherwise = read' (Leaf c : acc) cs
read' [result] [] = (result, [])
read' _ _  = error "invalid input"

This may require some modification, but I think its enough to get you on the right track. 这可能需要一些修改,但我认为它足以让你走上正轨。

The parsec answer by dbaupp is very easy to understand. dbaupp的parsec答案很容易理解。 As an example of a "low-level" approach, here is a hand written parser which uses a success continuation to handle the left-associative tree building: 作为“低级”方法的示例,这里是一个手写解析器,它使用成功延续来处理左关联树构建:

instance Read Tree where readsPrec _prec s = maybeToList (readTree s)

type TreeCont = (Tree,String) -> Maybe (Tree,String)

readTree :: String -> Maybe (Tree,String)
readTree = read'top Just where
  valid ')' = False
  valid '(' = False
  valid _ = True

  read'top :: TreeCont -> String -> Maybe (Tree,String)
  read'top acc s@(x:ys) | valid x =
    case ys of
      [] -> acc (Leaf x,[])
      (y:zs) -> read'branch acc s
  read'top _ _ = Nothing

  -- The next three are mutually recursive

  read'branch :: TreeCont -> String -> Maybe (Tree,String)
  read'branch acc (x:y:zs) | valid x = read'right (combine (Leaf x) >=> acc) y zs
  read'branch _ _ = Nothing

  read'right :: TreeCont -> Char -> String -> Maybe (Tree,String)
  read'right acc y ys | valid y = acc (Leaf y,ys)
  read'right acc '(' ys = read'branch (drop'close >=> acc) ys
     where drop'close (b,')':zs) = Just (b,zs)
           drop'close _ = Nothing
  read'right _ _ _ = Nothing  -- assert y==')' here

  combine :: Tree -> TreeCont
  combine build (t, []) = Just (Branch build t,"")
  combine build (t, ys@(')':_)) = Just (Branch build t,ys)  -- stop when lookahead shows ')'
  combine build (t, y:zs) = read'right (combine (Branch build t)) y zs

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM