简体   繁体   中英

How to parse this grammar in Parsec? (unusual case of left recursion)

I'm a beginner at Haskell, Parsec, and writing parsers in general. I'm trying to parse a simple language, which (simplifying it further for the sake of this question) consists simply of strings of nested brackets, eg [[][]][] .

I have the Haskell code below, which works fine. However, I would like to extend it so that unmatched brackets will match against the end of the string. So for example, ]][][[ should be equivalent to [[]][][[]] , and []] should be equivalent to [[]] . Doing this for open brackets matching the end of the string is easy, but doing it for closed brackets matching the start of the string results in left recursion and infinite loops, and I haven't figured out a way to resolve that. I am not sure if the reason has to do with the way I'm thinking about the grammar or the way I'm using the Parsec library, but either way I'd appreciate being shown the way forward.

Here is the working code I have:

{-# LANGUAGE NoMonomorphismRestriction #-}

import qualified Text.Parsec as Parsec

import Control.Applicative

-- for testing
parse rule text = Parsec.parse rule "(source)" text

data Expr = Brackets [Expr]
        deriving(Show)

openBracket = Parsec.char '['
closeBracket = Parsec.char ']'

parseBrackets = do
        expr <- Parsec.between openBracket closeBracket parseExpr
        return $ Brackets expr

parseExpr = Parsec.many parseBrackets

If I want closed brackets to match against the end of the string I can just change the definition of closeBracket to

closeBracket = (Parsec.char ']' >> return ()) <|> Parsec.eof

But despite quite a bit of trial and error I haven't found a solution to match unmatched ] s against the start of the string. I know that the usual solution to left recursion in Parsec is the chainl1 function, but that seems to be quite specialised for infix operators and I can't see a way to use it here.

Here's my take on this one:

import qualified Text.Parsec as Parsec
import Text.Parsec.String (Parser)
import Control.Monad (void)
import Control.Applicative

data Expr = Brackets [Expr]
        deriving(Show)

parseTopLevel :: Parser [Expr]
parseTopLevel =
    ((:) <$> parseStart <*> parseExpr) <|> parseExpr

parseStart :: Parser Expr
parseStart = do
    closeBracket
    go (Brackets [])
  where
    go r = (closeBracket *> go (Brackets [r])) <|> return r

parseBrackets :: Parser Expr
parseBrackets = do
        expr <- Parsec.between openBracket closeBracket parseExpr
        return $ Brackets expr

parseExpr :: Parser [Expr]
parseExpr = Parsec.many parseBrackets

openBracket :: Parser ()
openBracket = void $ Parsec.char '['

closeBracket :: Parser ()
closeBracket = (void $ Parsec.char ']') <|> Parsec.eof

As you can see, for parsing the unbalanced brackets at the beginning of the string, I couldn't use any of the combinators that come with parsec, I just wrote my own, parseStart . The rest is pretty much you code.

Here's what it returns on your example:

λ> Parsec.parse parseTopLevel "" "]][][["
Right [Brackets [Brackets []],Brackets [],Brackets [Brackets []]]
λ> Parsec.parse parseTopLevel "" "[[]][][[]]"
Right [Brackets [Brackets []],Brackets [],Brackets [Brackets []]]

As you can see, it returns the exact same thing for ]][][[ and [[]][][[]] as you wanted.

Here is a self-answer, based on redneb's improvements to my code . This version covers cases like []] in which redneb's code ignores the unmatched ] s rather than matching them against the start of the string.

This code works by simply parsing the whole expression as a ] -separated list of balanced expressions, and then explicitly building up the parse tree from that list. This feels somehow like admitting defeat, since the construction of the parse tree happens in a separate step from the actual parsing. Then again that seems to be how chainl1 works, so perhaps it's the "right way" after all. I won't accept my own answer in case anyone else has a better solution.

import qualified Text.Parsec as Parsec
import Text.Parsec.String (Parser)
import Control.Monad (void)
import Control.Applicative

-- for testing
parse rule text = Parsec.parse rule "(source)" text

data Expr = Brackets [Expr]
        deriving(Show)

parseTopLevel :: Parser [Expr]
parseTopLevel = do
        exprList <- parseExprAsList
        return $ composeExpr exprList

composeExpr :: [[Expr]] -> [Expr]
composeExpr [exprList] = exprList
composeExpr (exprList:next:tail) = composeExpr $ (Brackets exprList:next) : tail

parseExprAsList :: Parser [[Expr]]
parseExprAsList = Parsec.sepBy parseBalancedExpr (Parsec.char ']')

parseBalancedExpr :: Parser [Expr]
parseBalancedExpr = Parsec.many parseBrackets

parseBrackets :: Parser Expr
parseBrackets = do
        expr <- Parsec.between openBracket closeBracket parseBalancedExpr
        return $ Brackets expr

openBracket :: Parser ()
openBracket = void $ Parsec.char '['

closeBracket :: Parser ()
closeBracket = (void $ Parsec.char ']') <|> Parsec.eof

Some test cases:

*Main> parse parseTopLevel "[]]"
Right [Brackets [Brackets []]]
*Main> parse parseTopLevel "[[]]"
Right [Brackets [Brackets []]]

*Main> parse parseTopLevel "]["
Right [Brackets [],Brackets []]
*Main> parse parseTopLevel "[][]"
Right [Brackets [],Brackets []]

*Main> parse parseTopLevel "[]][]]]"
Right [Brackets [Brackets [Brackets [Brackets []],Brackets []]]]
*Main> parse parseTopLevel "[[[[]][]]"
Right [Brackets [Brackets [Brackets [Brackets []],Brackets []]]]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM