[英]Implementing `read` for a left-associative tree in Haskell
I'm having a hard time implementing Read for a tree structure. 我很难为树结构实现Read 。 I want to take a left-associative string (with parens) like ABC(DE)F
and convert it into a tree. 我想采用像ABC(DE)F
这样的左关联字符串(带有parens)并将其转换为树。 That particular example corresponds to the tree 该特定示例对应于树
. 。
Here's the data type I'm using (though I'm open to suggestions): 这是我正在使用的数据类型(虽然我愿意接受建议):
data Tree = Branch Tree Tree | Leaf Char deriving (Eq)
That particular tree would be, in Haskell: 那个特定的树将在Haskell中:
example = Branch (Branch (Branch (Branch (Leaf 'A')
(Leaf 'B'))
(Leaf 'C'))
(Branch (Leaf 'D')
(Leaf 'E')))
(Leaf 'F')
My show
function looks like: 我的show
函数看起来像:
instance Show Tree where
show (Branch l r@(Branch _ _)) = show l ++ "(" ++ show r ++ ")"
show (Branch l r) = show l ++ show r
show (Leaf x) = [x]
I want to make a read
function so that 我想制作一个read
功能
read "ABC(DE)F" == example
This is a situation where using a parsing library makes the code amazingly short and extremely expressive. 在这种情况下,使用解析库会使代码非常短且极具表现力。 (I was amazed that it was so neat when I was experimenting to answer this!) (我很惊讶,这是如此整洁,当我尝试回答这个!)
I'm going to use Parsec (that article provides some links for more information), and using it in "applicative mode" (rather than monadic), since we don't need the extra power/foot-shooting-ability of monads. 我将使用Parsec (该文章提供一些链接以获取更多信息),并在“应用模式”(而不是monadic)中使用它,因为我们不需要monad的额外功率/足部射击能力。
First the various imports and definitions: 首先是各种进口和定义:
import Text.Parsec
import Control.Applicative ((<*), (<$>))
data Tree = Branch Tree Tree | Leaf Char deriving (Eq, Show)
paren, tree, unit :: Parsec String st Tree
Now, the basic unit of the tree is either a single character (that's not a parenthesis) or a parenthesised tree. 现在,树的基本单元是单个字符(不是括号)或带括号的树。 The parenthesised tree is just a normal tree between (
and )
. 带括号的树只是(
和)
之间的普通树。 And a normal tree is just units put into branches left-associatedly (it's extremely self-recursive). 而正常的树只是左边相关的分支单元(它非常自我递归)。 In Haskell with Parsec: 在Haskell与Parsec:
-- parenthesised tree or `Leaf <character>`
unit = paren <|> (Leaf <$> noneOf "()") <?> "group or literal"
-- normal tree between ( and )
paren = between (char '(') (char ')') tree
-- all the units connected up left-associatedly
tree = foldl1 Branch <$> many1 unit
-- attempt to parse the whole input (don't short-circuit on the first error)
onlyTree = tree <* eof
(Yes, that's the entire parser!) (是的,那就是整个解析器!)
If we wanted to, we could do without paren
and unit
but the code above is very expressive, so we can leave it as is. 如果我们想要,我们可以没有paren
和unit
但上面的代码非常具有表现力,所以我们可以保持原样。
As a brief explanation (I've provided links to the documentation): 作为简要说明(我提供了文档的链接):
(<|>)
basically means "left parser or right parser"; (<|>)
基本上是指“左解析器或右解析器”; (<?>)
allows you to make nicer error messages; (<?>)
允许您制作更好的错误消息; noneOf
will parse anything that's not in the given list of characters; noneOf
将解析不在给定字符列表中的任何内容; between
takes three parsers, and returns the value of the third parser as long as it is delimited by the first and second ones; between
需要三个解析器,并且只要它是由所述第一和第二个分隔返回第三个分析器的值; char
parses its argument literally. char
从字面上解析其论点。 many1
parses one or more of its argument into a list (it appears that the empty string is invalid hence many1
, rather than many
which parses zero or more); many1
将一个或多个参数解析为一个列表(似乎空字符串无效,因此many1
,而不是many
解析零或更多); eof
matches the end of the input. eof
匹配输入的结尾。 We can use the parse
function to run the parser (it returns Either ParseError Tree
, Left
is an error and Right
is a correct parse). 我们可以使用parse
函数来运行解析器(它返回Either ParseError Tree
, Left
是一个错误, Right
是一个正确的解析)。
read
正如read
Using it as a read
like function could be something like: 使用它作为read
功能可能是这样的:
read' str = case parse onlyTree "" str of
Right tr -> tr
Left er -> error (show er)
(I've used read'
to avoid conflicting with Prelude.read
; if you want a Read
instance you'll have to do a bit more work to implement readPrec
(or whatever is required) but it shouldn't be too hard with the actual parsing already complete.) (我使用read'
来避免与Prelude.read
发生冲突;如果你想要一个Read
实例,你将需要做更多的工作来实现readPrec
(或者任何需要的东西)但是它不应该太难了实际解析已经完成。)
Some basic examples: 一些基本的例子:
*Tree> read' "A"
Leaf 'A'
*Tree> read' "AB"
Branch (Leaf 'A') (Leaf 'B')
*Tree> read' "ABC"
Branch (Branch (Leaf 'A') (Leaf 'B')) (Leaf 'C')
*Tree> read' "A(BC)"
Branch (Leaf 'A') (Branch (Leaf 'B') (Leaf 'C'))
*Tree> read' "ABC(DE)F" == example
True
*Tree> read' "ABC(DEF)" == example
False
*Tree> read' "ABCDEF" == example
False
Demonstrating errors: 证明错误:
*Tree> read' ""
***Exception: (line 1, column 1):
unexpected end of input
expecting group or literal
*Tree> read' "A(B"
***Exception: (line 1, column 4):
unexpected end of input
expecting group or literal or ")"
And finally, the difference between tree
and onlyTree
: 最后, tree
和onlyTree
之间的区别:
*Tree> parse tree "" "AB)CD" -- success: ignores ")CD"
Right (Branch (Leaf 'A') (Leaf 'B'))
*Tree> parse onlyTree "" "AB)CD" -- fail: can't parse the ")"
Left (line 1, column 3):
unexpected ')'
expecting group or literal or end of input
Parsec is amazing! Parsec太神奇了! This answer might be long but the core of it is just 5 or 6 lines of code which do all the work. 这个答案可能很长,但它的核心只有5或6行代码完成所有工作。
This very-much looks like a stack structure. 这非常像堆栈结构。 When you encounter your input string "ABC(DE)F"
, you Leaf
any atom you find (non-parenthesis) and put it in an accumulator list. 当你遇到你的输入字符串"ABC(DE)F"
,你Leaf
你找到(非括号),并把它放在一个蓄能器列表中的任何原子。 When you have 2 items in the list, you Branch
them together. 如果列表中有2个项目,则将它们Branch
在一起。 This could be done with something like (note, untested, just including to give an idea): 这可以用类似的东西来完成(注意,未经测试,仅包括给出一个想法):
read' [r,l] str = read' [Branch l r] str
read' acc (c:cs)
-- read the inner parenthesis
| c == '(' = let (result, rest) = read' [] cs
in read' (result : acc) rest
-- close parenthesis, return result, should be singleton
| c == ')' = (acc, cs)
-- otherwise, add a leaf
| otherwise = read' (Leaf c : acc) cs
read' [result] [] = (result, [])
read' _ _ = error "invalid input"
This may require some modification, but I think its enough to get you on the right track. 这可能需要一些修改,但我认为它足以让你走上正轨。
The parsec answer by dbaupp is very easy to understand. dbaupp的parsec答案很容易理解。 As an example of a "low-level" approach, here is a hand written parser which uses a success continuation to handle the left-associative tree building: 作为“低级”方法的示例,这里是一个手写解析器,它使用成功延续来处理左关联树构建:
instance Read Tree where readsPrec _prec s = maybeToList (readTree s)
type TreeCont = (Tree,String) -> Maybe (Tree,String)
readTree :: String -> Maybe (Tree,String)
readTree = read'top Just where
valid ')' = False
valid '(' = False
valid _ = True
read'top :: TreeCont -> String -> Maybe (Tree,String)
read'top acc s@(x:ys) | valid x =
case ys of
[] -> acc (Leaf x,[])
(y:zs) -> read'branch acc s
read'top _ _ = Nothing
-- The next three are mutually recursive
read'branch :: TreeCont -> String -> Maybe (Tree,String)
read'branch acc (x:y:zs) | valid x = read'right (combine (Leaf x) >=> acc) y zs
read'branch _ _ = Nothing
read'right :: TreeCont -> Char -> String -> Maybe (Tree,String)
read'right acc y ys | valid y = acc (Leaf y,ys)
read'right acc '(' ys = read'branch (drop'close >=> acc) ys
where drop'close (b,')':zs) = Just (b,zs)
drop'close _ = Nothing
read'right _ _ _ = Nothing -- assert y==')' here
combine :: Tree -> TreeCont
combine build (t, []) = Just (Branch build t,"")
combine build (t, ys@(')':_)) = Just (Branch build t,ys) -- stop when lookahead shows ')'
combine build (t, y:zs) = read'right (combine (Branch build t)) y zs
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.