简体   繁体   中英

Parsing an expression grammar having function application with parser combinators (left-recursion)

As a simplified subproblem of a parser for a real language, I am trying to implement a parser for expressions of a fictional language which looks similar to standard imperative languages (like Python, JavaScript, and so). Its syntax features the following construct:

  • integer numbers
  • identifiers ( [a-zA-Z]+ )
  • arithmetic expressions with + and * and parenthesis
  • structure access with . (eg foo.bar.buz )
  • tuples (eg (1, foo, bar.buz) ) (to remove ambiguity one-tuples are written as (x,) )
  • function application (eg foo(1, bar, buz()) )
  • functions are first class so they can also be returned from other functions and directly be applied (eg foo()() is legal because foo() might return a function)

So a fairly complex program in this language is

(1+2*3, f(4,5,6)(bar) + qux.quux()().quuux)

the associativity is supposed to be

( (1+(2*3)), ( ((f(4,5,6))(bar)) + ((((qux.quux)())()).quuux) ) )

I'm currently using the very nice uu-parsinglib an applicative parser combinator library.

The first problem was obviously that the intuitive expression grammar ( expr -> identifier | number | expr * expr | expr + expr | (expr) is left-recursive. But I could solve that problem using the the pChainl combinator (see parseExpr in the example below).

The remaining problem (hence this question) is function application with functions returned from other functions ( f()() ). Again, the grammar is left recursive expr -> fun-call | ...; fun-call -> expr ( parameter-list ) expr -> fun-call | ...; fun-call -> expr ( parameter-list ) expr -> fun-call | ...; fun-call -> expr ( parameter-list ) . Any ideas how I can solve this problem elegantly using uu-parsinglib ? (the problem should directly apply to parsec , attoparsec and other parser combinators as well I guess).

See below my current version of the program. It works well but function application is only working on identifiers to remove the left-recursion:

 {-# LANGUAGE FlexibleContexts #-}
 {-# LANGUAGE RankNTypes #-}

 module TestExprGrammar
     (
     ) where

 import Data.Foldable (asum)
 import Data.List (intercalate)
 import Text.ParserCombinators.UU
 import Text.ParserCombinators.UU.Utils
 import Text.ParserCombinators.UU.BasicInstances

 data Node =
     NumberLiteral Integer
     | Identifier String
     | Tuple [Node]
     | MemberAccess Node Node
     | FunctionCall Node [Node]
     | BinaryOperation String Node Node

 parseFunctionCall :: Parser Node
 parseFunctionCall =
     FunctionCall <$>
         parseIdentifier {- `parseExpr' would be correct but left-recursive -}
         <*> parseParenthesisedNodeList 0

 operators :: [[(Char, Node -> Node -> Node)]]
 operators = [ [('+', BinaryOperation "+")]
             , [('*' , BinaryOperation "*")]
             , [('.', MemberAccess)]
             ]

 samePrio :: [(Char, Node -> Node -> Node)] -> Parser (Node -> Node -> Node)
 samePrio ops = asum [op <$ pSym c <* pSpaces | (c, op) <- ops]

 parseExpr :: Parser Node
 parseExpr =
     foldr pChainl
           (parseIdentifier
           <|> parseNumber
           <|> parseTuple
           <|> parseFunctionCall
           <|> pParens parseExpr
           )
           (map samePrio operators)

 parseNodeList :: Int -> Parser [Node]
 parseNodeList n =
     case n of
       _ | n < 0 -> parseNodeList 0
       0 -> pListSep (pSymbol ",") parseExpr
       n -> (:) <$>
           parseExpr
           <* pSymbol ","
           <*> parseNodeList (n-1)

 parseParenthesisedNodeList :: Int -> Parser [Node]
 parseParenthesisedNodeList n = pParens (parseNodeList n)

 parseIdentifier :: Parser Node
 parseIdentifier = Identifier <$> pSome pLetter <* pSpaces

 parseNumber :: Parser Node
 parseNumber = NumberLiteral <$> pNatural

 parseTuple :: Parser Node
 parseTuple =
     Tuple <$> parseParenthesisedNodeList 1
     <|> Tuple [] <$ pSymbol "()"

 instance Show Node where
     show n =
         let showNodeList ns = intercalate ", " (map show ns)
             showParenthesisedNodeList ns = "(" ++ showNodeList ns ++ ")"
         in case n of
              Identifier i -> i
              Tuple ns -> showParenthesisedNodeList ns
              NumberLiteral n -> show n
              FunctionCall f args -> show f ++ showParenthesisedNodeList args
              MemberAccess f g -> show f ++ "." ++ show g
              BinaryOperation op l r -> "(" ++ show l ++ op ++ show r ++ ")"

I don't know this library but can show you how to remove left recursion. The standard right recursive expression grammar is

E -> T E'
E' -> + TE'  |  eps
T -> F T'
T' -> * FT'  |  eps
F -> NUMBER | ID | ( E ) 

To add function application you must decide its level of precedence. In most languages I've seen it is highest. So you'd add another layer of productions for function application.

E -> T E'
E' -> + TE'  |  eps
T -> AT'
T' -> * A T' |  eps
A -> F A'
A' -> ( E ) A' | eps
F -> NUMBER | ID | ( E ) 

Yes this is a hairy-looking grammar and bigger than the left recursive one. That's the price of top-down predictive parsing. If you want simpler grammars use a bottom up parser generator a la yacc.

Looking briefly at the list-like combinators for uu-parsinglib (I'm more familiar with parsec ), I think you can solve this by folding over the result of the pSome combinator:

 parseFunctionCall :: Parser Node
 parseFunctionCall =
     foldl' FunctionCall <$>
         parseIdentifier {- `parseExpr' would be correct but left-recursive -}
         <*> pSome (parseParenthesisedNodeList 0)

This is also equivalent to the Alternative some combinator, which should indeed apply to the other parsing libs you mentioned.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM