简体   繁体   中英

Invalid exception messages from parser combinators in Haskell

I'm studying functional programming using Haskell language. And as an exercise I need to implement a function parsing a primitive arithmetic expression from String . The function must be able to handle double literals, operations + , - , * , / with the usual precedence and parentheses.

parseExpr :: String -> Except ParseError Expr

with next defined data types:

data ParseError = ErrorAtPos Natural
  deriving Show

newtype Parser a = P (ExceptState ParseError (Natural, String) a)
  deriving newtype (Functor, Applicative, Monad)

data Prim a
  = Add a a 
  | Sub a a 
  | Mul a a 
  | Div a a 
  | Abs a   
  | Sgn a
  deriving Show

data Expr
  = Val Double      
  | Op (Prim Expr)  
  deriving Show

Where ExceptState is a modified State monad, allowing to throw exception pointing at the error position.

data Annotated e a = a :# e
  deriving Show
infix 0 :#

data Except e a = Error e | Success a 
  deriving Show

data ExceptState e s a = ES { runES :: s -> Except e (Annotated s a) }

Also ExceptState has defined Functor , Applicative and Monad instances, which were thoroughly tested earlier, so I am positive in their correctness.

instance Functor (ExceptState e s) where
  fmap func ES{runES = runner} = ES{runES = \s ->
    case (runner s) of
      Error err   -> Error err
      Success ans -> Success (mapAnnotated func $ ans) }

instance Applicative (ExceptState e s) where
  pure arg = ES{runES = \s -> Success (arg :# s)}
  p <*> q = Control.Monad.ap p q

instance Monad (ExceptState e s) where
  m >>= f = joinExceptState (fmap f m)
    where
      joinExceptState :: ExceptState e s (ExceptState e s a) -> ExceptState e s a
      joinExceptState ES{runES = runner} = ES{runES = \s ->
        case (runner s) of
          Error err -> Error err
          Success (ES{runES = runner2} :# s2) ->
            case (runner2 s2) of
              Error err           -> Error err
              Success (res :# s3) -> Success (res :# s3) }

To implement the function parseExpr I used basic parser combinators:

pChar :: Parser Char
pChar = P $ ES $ \(pos, s) ->
  case s of
    []     -> Error (ErrorAtPos pos)
    (c:cs) -> Success (c :# (pos + 1, cs))

parseError :: Parser a
parseError = P $ ES $ \(pos, _) -> Error (ErrorAtPos pos)

instance Alternative Parser where
  empty = parseError

  (<|>) (P(ES{runES = runnerP})) (P(ES{runES = runnerQ})) =
    P $ ES $ \(pos, s) ->
      case runnerP (pos, s) of
        Error _     -> runnerQ (pos, s)
        Success res -> Success res

instance MonadPlus Parser

which were used to construct more complex ones:

-- | elementary parser not consuming a character, failing if input doesn't
-- reach its end
pEof :: Parser ()
pEof = P $ ES $ \(pos, s) ->
  case s of
    [] -> Success (() :# (pos, []))
    _  -> Error $ ErrorAtPos pos

-- | parses a single digit value
parseVal :: Parser Expr
parseVal = Val <$> (fromIntegral . digitToInt) <$> mfilter isDigit pChar

-- | parses an expression inside parenthises
pParenth :: Parser Expr
pParenth = do
  void $ mfilter (== '(') pChar
  expr <- parseAddSub
  (void $ mfilter (== ')') pChar) <|> parseError
  return expr

-- | parses the most prioritised operations
parseTerm :: Parser Expr
parseTerm = pParenth <|> parseVal

parseAddSub :: Parser Expr
parseAddSub = do
  x <- parseTerm
  ys <- many parseSecond
  return $ foldl (\acc (sgn, y) -> Op $
    (if sgn == '+' then Add else Sub) acc y) x ys

  where
    parseSecond :: Parser (Char, Expr)
    parseSecond = do
      sgn <- mfilter ((flip elem) "+-") pChar
      y <- parseTerm <|> parseError
      return (sgn, y)

-- | Parses the whole expression. Begins from parsing on +, - level and
-- successfully consuming the whole string.
pExpr :: Parser Expr
pExpr = do
  expr <- parseAddSub
  pEof
  return expr

-- | More convinient way to run 'pExpr' parser
parseExpr :: String -> Except ParseError Expr
parseExpr = runP pExpr

As a result, at this point function works as intended if given String expression is valid:

ghci> parseExpr "(2+3)-1"
Success (Op (Sub (Op (Add (Val 2.0) (Val 3.0))) (Val 1.0)))
ghci> parseExpr "(2+3-1)-1"
Success (Op (Sub (Op (Sub (Op (Add (Val 2.0) (Val 3.0))) (Val 1.0))) (Val 1.0)))

Otherwise ErrorAtPos does not point at the necessary position:

ghci> parseExpr "(2+)-1"
Error (ErrorAtPos 1)
ghci> parseExpr "(2+3-)-1"
Error (ErrorAtPos 1)

What am I doing wrong here? Thank you in advance.

My main assumption was that something wrong was with function (<|>) of Alternative Parser and it incorrectly changed pos variable.

  (<|>) (P(ES{runES = runnerP})) (P(ES{runES = runnerQ})) =
    P $ ES $ \(pos, s) ->
      case runnerP (pos, s) of
        -- Error _     -> runnerQ (pos, s)
        Error (ErrorAtPos pos')     -> runnerQ (pos' + pos, s)
        Success res -> Success res

But it led to more strange results:

ghci> parseExpr "(5+)-3"
Error (ErrorAtPos 84)
ghci> parseExpr "(5+2-)-3"
Error (ErrorAtPos 372)

Then more doubts were aimed at joinExceptState function of instance Monad (ExceptState es) in spite of everything I've run it through, doubts that it wasn't working on s of (Natural, String) type as I indented in this case. But then I can't really change it for this concrete type only.

Excellent question, although it would have been even better if it really included all your code. I filled in the missing pieces:

mapAnnotated :: (a -> b) -> Annotated s a -> Annotated s b
mapAnnotated f (a :# e) = (f a) :# e

runP :: Parser a -> String -> Except ParseError a
runP (P (ES {runES = p})) s = case p (0, s) of
  Error e -> Error e
  Success (a :# e) -> Success a

Why is parseExpr "(5+)-3" equal to Error (ErrorAtPos 1) ? Here's what happens: we call parseExpr which (ultimately) calls parseTerm which is just pParenth <|> parseVal . pParenth fails, of course, so we look at the definition of <|> to work out what to do. That definition says: if the thing on the left fails, try the thing on the right. So we try the thing on the right (ie parseVal ), which also fails, and we report the second error, which is in fact at position 1.

To see this more clearly, you can just replace pParenth <|> parseVal with parseVal <|> pParenth and observe that you get ErrorAtPos 2 instead.

This is almost certainly not the behaviour you want. The documentation of Megaparsec's p <|> q , here , says:

If [parser] p fails without consuming any input , parser q is tried.

(emphasis in original, meaning: parser q is not tried in other cases). This is a more useful thing to do. If you got reasonably far trying to parse a parenthesised expression and then got an error, probably you want to report that error rather than complaining that '(' isn't a digit.

Since you say this is an exercise, I'm not going to tell you how to fix the problem. I'll tell you some other stuff, though.

First, this is not your only issue with error reporting. Above we see that parseVal "(1" reports an error at position 1 ( after the problematic character, which is at position 0) whereas pParenth "(5+)-3" reports an error at position 2 ( before the problematic character, which is at position 3). Ideally, both should give the position of the problematic character itself. (Of course, it'd be even better if the parser stated what character it expected, but that's more difficult to do.)

Second, the way I found the problem was to import Debug.Trace , replace your definition of pChar with

pChar :: Parser Char
pChar = P $ ES $ \(pos, s) -> traceShow (pos, s) $
  case s of
    []     -> Error (ErrorAtPos pos)
    (c:cs) -> Success (c :# (pos + 1, cs))

and mull over the output for a bit. Debug.Trace is sometimes less useful than one hopes, because of lazy evaluation, but for a program like this it can help a lot.

Third, if you modify your definition of <|> to match Megaparsec's does, you might need Megaparsec's try combinator. (Not for the grammar you're trying to parse now , but maybe later.) try solves the issue that

(singleChar 'p' *> singleChar 'q') <|> (singleChar 'p' *> singleChar 'r')

fails on the string "pr" with Megaparsec's <|> .

Fourth, you sometimes write someParser <|> parseError , which I think is equivalent to someParser for both your definition of <|> and Megaparsec's.

Fifth, you don't need void ; just ignore the result, it's the same thing.

Sixth, your Except seems to just be Either .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM