简体   繁体   中英

Parse string to list in Haskell

I would like to parse string to list in Haskell, but I have no idea how to write it proper. String structure is:

A [B [], C [D [], E []]]

and it represents structure

A-+-B
  |
  `-C-+-D
      |
      `-E

And ideas? Thanks!

Your best bet when parsing in Haskell is almost always Parsec. Here's an example.

import Text.ParserCombinators.Parsec

data Tree a = Tree [Tree a] | Leaf a deriving Show

parseLeaf :: Parser (Tree Char)
parseLeaf = noneOf "[]" >>= return.Leaf

parseNode :: Parser (Tree Char)
parseNode = do
    char '['
    a <- parseTree
    char ']'
    return (Tree a)

parseTree = many1 (parseLeaf <|> parseNode)

Testing this out:

> parseTest parseTree "a[aa]"
 [Leaf 'a',Tree [Leaf 'a',Leaf 'a']]
> parseTest parseTree "a[aa]aaa"
 [Leaf 'a',Tree [Leaf 'a',Leaf 'a'],Leaf 'a',Leaf 'a',Leaf 'a']
> parseTest parseTree "a[aa[aaaaaa]]aaa"
 [Leaf 'a',Tree [Leaf 'a',Leaf 'a',Tree [Leaf 'a',Leaf 'a',Leaf 'a',Leaf 'a',Leaf 'a',Leaf 'a']],Leaf 'a',Leaf 'a',Leaf 'a']

Here's how it works. Parsers in Parsec are monadic, and so support the usual do-notation. (You can write parsers applicatively as well, you can look up how to do that elsewhere.)

We start with a simple data structure

data Tree a = Tree [Tree a] | Leaf a deriving Show

This is not quite the same as what you wanted, but as I wasn't entirely sure what your semantics were, I've used this example. You should be able to adapt it for your purposes.

We then need a parser for each possible part of the tree. A leaf is fairly simple, it is just anything that isn't a bracket:

parseLeaf :: Parser (Tree Char)
parseLeaf = noneOf "[]" >>= return.Leaf

Note that this could have been written out in longhand like

parseLeaf = do
   a <- noneOf "[]"
   return (Leaf a)

Then to parse a branching part of the tree, we need to parse the opening and closing brackets. In between the brackets we can have a whole tree again.

parseNode :: Parser (Tree Char)
parseNode = do
    char '['
    a <- parseTree
    char ']'
    return (Tree a)

So what is parseTree ? It is just many of either of the parsers we have written. The <|> operator allows the parser to choose either parser, whichever parses correctly first. So we have

parseTree = many1 (parseLeaf <|> parseNode)

You should be able to adapt this to your purposes. It looks like your structure might be somewhat more like this:

data Structure a = Node (Structure a) a a | Leaf a

By following the same principles, working out what the parser is needed for each possibility and then combining them, you should be parsing in no time.

UPDATE

Here is a very quick and dirty version of parsing the data structure you asked about. It does not support spaces or commas, but should help demonstrate the basic principle.

data Tree = Tree Char [Tree] deriving Show

parseTree :: Parser Tree
parseTree = do
    character <- noneOf "[]"
    subtree   <- parseSubTree
    return $ Tree character subtree

parseSubTree :: Parser [Tree]
parseSubTree = do
    char '['
    trees <- many parseTree
    char ']'
    return trees

And here is a version with the commas and whitespace added in a fairly simple way. There are a lot of useful combinators in the parsec library that could simplify and improve this, you should investigate them yourself. Note also the applicative style used for the symbol shortcut parser definition. Many people prefer applicative style for parsers and it can be a lot more succinct, so that is worth finding out about as well.

data Tree = Tree Char [Tree] deriving Show

symbol :: String -> Parser String
symbol s = string s <* spaces

parseTree :: Parser Tree
parseTree = do
    character <- noneOf "[]"
    spaces
    subtree <- parseSubTree
    return $ Tree character subtree

parseSubTree :: Parser [Tree]
parseSubTree = do
    symbol "["
    trees <- sepBy parseTree (symbol ",")
    symbol "]"
    return trees

Here it is working:

> parseTest parseTree "A [ A [ B [ ] , C   [ ], D [ ] ] ] "
Tree 'A' [Tree 'A' [Tree 'B' [],Tree 'C' [],Tree 'D' []]]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM