简体   繁体   中英

Parsing expressions inside arithmetic expressions

I would like to parse arithmetic expressions.

Here is my current parser:

data AExpr
    = ExprAsAExpr Expr
    | IntConst Integer
    | Neg AExpr
    | ABinary ABinOp AExpr AExpr
    deriving (Show, Eq)

aExpr :: Parser AExpr
aExpr = makeExprParser aTerm aOperators

aTerm :: Parser AExpr
aTerm
    =   parens aExpr
    <|> IntConst <$> integerParser

aOperators :: [[Operator Parser AExpr]]
aOperators =
    [ [Prefix (Neg <$ symbol "-") ]
    , [ InfixL (ABinary Multiply <$ symbol "*")
      , InfixL (ABinary Divide   <$ symbol "/") ]
    , [ InfixL (ABinary Add      <$ symbol "+")
      , InfixL (ABinary Subtract <$ symbol "-") ]
    ]

Using this I can correctly parse this:

1 + 2

Generating an AST like this.

(ABinary Add (IntConst 1) (IntConst 2))

Another thing I can parse are general expressions. These can be things such as variables, method calls, ternaries etc.

Eg

Identifiers:

varName

This generates:

(Identifier (Name "varName"))

Method Calls:

methodCall()

This generates:

(MethodCall (Name "methodCall") (BlockExpr []))

Here's an example for parsing general expressions.

expressionParser :: Parser Expr
expressionParser
    =   methodCallParser
    <|> identifierParser

This works fine but I would also like to parse arithmetic expressions in this.

expressionParser :: Parser Expr
expressionParser
    =   newClassInstanceParser
    <|> methodCallParser
    <|> AExprAsExpr <$> aExpr
    <|> identifierParser

This means using the expressionParser I can now parse all the different expressions including arithmetic expressions. If it happens to be an arithmetic expression it gets wrapped in AExprAsExpr .

Problem

I would like to parse arithmetic expressions containing other expressions.

Eg

x + y

To do this my original thought was to change the arithmetic parser so it also parses expressions.

data AExpr
    = ExprAsAExpr Expr
    | IntConst Integer
    | Neg AExpr
    | ABinary ABinOp AExpr AExpr
    deriving (Show, Eq)

aExpr :: Parser AExpr
aExpr = makeExprParser aTerm aOperators

aTerm :: Parser AExpr
aTerm
    =   parens aExpr
    <|> IntConst <$> integerParser
    <|> ExprAsAExpr <$> expressionParser

aOperators :: [[Operator Parser AExpr]]
aOperators =
    [ [Prefix (Neg <$ symbol "-") ]
    , [ InfixL (ABinary Multiply <$ symbol "*")
      , InfixL (ABinary Divide   <$ symbol "/") ]
    , [ InfixL (ABinary Add      <$ symbol "+")
      , InfixL (ABinary Subtract <$ symbol "-") ]
    ]

The problem with this is there is a recursive loop as aTerm calls the expression parser, the expression parser calls aExpr . This leads to an infinite loop. There is also the issue that all identifiers will now be wrapped inside an AExprAsExpr .

What is the correct method of parsing expressions inside arithmetic expressions?

EDIT I just now realized that you are using makeExpressionParser and my answer doesn't really apply to that. Anyway maybe this answer is still helpful?

Parsec is a type of recursive-descent parser, which means it cannot handle left recursion, as you are seeing. You need to factor it out, which can always be done if the grammar is context-free. One way you see this factorization done is by having a production for each precedence level. Here is an example grammar for simple arithmetic:

expr ::= addExpr
addExpr ::= mulExpr '+' addExpr
          | mulExpr '-' addExpr
          | mulExpr
mulExpr ::= term '*' mulExpr
          | term '/' mulExpr
          | term
term ::= '(' expr ')'
       | number

Notice the pattern: the first symbol in each production calls down to the next more specific one. Then explicit parentheses allow us to re-enter the top-level production. This is generally how operator precedence is expressed in recursive descent.

The above grammar can only produce right-nested expressions. While it will accept exactly the right strings, it does not correctly parse when the interpretation is left-associative. In particular,

1 - 2 - 3 - 4

will be parsed as

1 - (2 - (3 - 4))

which is not correct according to our conventions. In a general recursive-descent parser you have to do some tricks to associate correctly here. In Parsec, however, we have the many combinators, which we can use to our advantage. For example, to parse left-associated subtraction, we could say

subExpr = foldl1 (-) <$> many1 mulExpr

The next level here are apparently the chainl combinators which seem to have been designed for just this purpose (though I just learned about it now -- guess I should have perused the docs more). An example of using this would be

addExpr = chainl1 mulExpr oper
    where
    oper = choice [ (+) <$ symbol '+'
                  , (-) <$ symbol '-'
                  ]

I love writing parsers in Haskell. Good luck!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM