来自 Haskell 中解析器组合器的无效异常消息

Question

I'm studying functional programming using Haskell language.我正在使用 Haskell 语言学习函数式编程。 And as an exercise I need to implement a function parsing a primitive arithmetic expression from String .作为练习，我需要实现一个 function 从String解析原始算术表达式。 The function must be able to handle double literals, operations + , - , * , / with the usual precedence and parentheses. function 必须能够处理双文字、操作+ 、 - 、 * 、 /以及通常的优先级和括号。

parseExpr :: String -> Except ParseError Expr

with next defined data types:使用下一个定义的数据类型：

data ParseError = ErrorAtPos Natural
  deriving Show

newtype Parser a = P (ExceptState ParseError (Natural, String) a)
  deriving newtype (Functor, Applicative, Monad)

data Prim a
  = Add a a 
  | Sub a a 
  | Mul a a 
  | Div a a 
  | Abs a   
  | Sgn a
  deriving Show

data Expr
  = Val Double      
  | Op (Prim Expr)  
  deriving Show

Where ExceptState is a modified State monad, allowing to throw exception pointing at the error position.其中ExceptState是修改后的State monad，允许抛出指向错误 position 的异常。

data Annotated e a = a :# e
  deriving Show
infix 0 :#

data Except e a = Error e | Success a 
  deriving Show

data ExceptState e s a = ES { runES :: s -> Except e (Annotated s a) }

Also ExceptState has defined Functor , Applicative and Monad instances, which were thoroughly tested earlier, so I am positive in their correctness. ExceptState还定义了Functor 、 Applicative和Monad实例，这些实例之前已经过全面测试，所以我对它们的正确性持肯定态度。

instance Functor (ExceptState e s) where
  fmap func ES{runES = runner} = ES{runES = \s ->
    case (runner s) of
      Error err   -> Error err
      Success ans -> Success (mapAnnotated func $ ans) }

instance Applicative (ExceptState e s) where
  pure arg = ES{runES = \s -> Success (arg :# s)}
  p <*> q = Control.Monad.ap p q

instance Monad (ExceptState e s) where
  m >>= f = joinExceptState (fmap f m)
    where
      joinExceptState :: ExceptState e s (ExceptState e s a) -> ExceptState e s a
      joinExceptState ES{runES = runner} = ES{runES = \s ->
        case (runner s) of
          Error err -> Error err
          Success (ES{runES = runner2} :# s2) ->
            case (runner2 s2) of
              Error err           -> Error err
              Success (res :# s3) -> Success (res :# s3) }

To implement the function parseExpr I used basic parser combinators:为了实现 function parseExpr ，我使用了基本的解析器组合器：

pChar :: Parser Char
pChar = P $ ES $ \(pos, s) ->
  case s of
    []     -> Error (ErrorAtPos pos)
    (c:cs) -> Success (c :# (pos + 1, cs))

parseError :: Parser a
parseError = P $ ES $ \(pos, _) -> Error (ErrorAtPos pos)

instance Alternative Parser where
  empty = parseError

  (<|>) (P(ES{runES = runnerP})) (P(ES{runES = runnerQ})) =
    P $ ES $ \(pos, s) ->
      case runnerP (pos, s) of
        Error _     -> runnerQ (pos, s)
        Success res -> Success res

instance MonadPlus Parser

which were used to construct more complex ones:用于构建更复杂的：

-- | elementary parser not consuming a character, failing if input doesn't
-- reach its end
pEof :: Parser ()
pEof = P $ ES $ \(pos, s) ->
  case s of
    [] -> Success (() :# (pos, []))
    _  -> Error $ ErrorAtPos pos

-- | parses a single digit value
parseVal :: Parser Expr
parseVal = Val <$> (fromIntegral . digitToInt) <$> mfilter isDigit pChar

-- | parses an expression inside parenthises
pParenth :: Parser Expr
pParenth = do
  void $ mfilter (== '(') pChar
  expr <- parseAddSub
  (void $ mfilter (== ')') pChar) <|> parseError
  return expr

-- | parses the most prioritised operations
parseTerm :: Parser Expr
parseTerm = pParenth <|> parseVal

parseAddSub :: Parser Expr
parseAddSub = do
  x <- parseTerm
  ys <- many parseSecond
  return $ foldl (\acc (sgn, y) -> Op $
    (if sgn == '+' then Add else Sub) acc y) x ys

  where
    parseSecond :: Parser (Char, Expr)
    parseSecond = do
      sgn <- mfilter ((flip elem) "+-") pChar
      y <- parseTerm <|> parseError
      return (sgn, y)

-- | Parses the whole expression. Begins from parsing on +, - level and
-- successfully consuming the whole string.
pExpr :: Parser Expr
pExpr = do
  expr <- parseAddSub
  pEof
  return expr

-- | More convinient way to run 'pExpr' parser
parseExpr :: String -> Except ParseError Expr
parseExpr = runP pExpr

As a result, at this point function works as intended if given String expression is valid:因此，如果给定的String表达式有效，此时 function 将按预期工作：

ghci> parseExpr "(2+3)-1"
Success (Op (Sub (Op (Add (Val 2.0) (Val 3.0))) (Val 1.0)))
ghci> parseExpr "(2+3-1)-1"
Success (Op (Sub (Op (Sub (Op (Add (Val 2.0) (Val 3.0))) (Val 1.0))) (Val 1.0)))

Otherwise ErrorAtPos does not point at the necessary position:否则ErrorAtPos不会指向必要的 position：

ghci> parseExpr "(2+)-1"
Error (ErrorAtPos 1)
ghci> parseExpr "(2+3-)-1"
Error (ErrorAtPos 1)

What am I doing wrong here?我在这里做错了什么？ Thank you in advance.先感谢您。

My main assumption was that something wrong was with function (<|>) of Alternative Parser and it incorrectly changed pos variable.我的主要假设是Alternative Parser的 function (<|>)出了点问题，它错误地更改了pos变量。

  (<|>) (P(ES{runES = runnerP})) (P(ES{runES = runnerQ})) =
    P $ ES $ \(pos, s) ->
      case runnerP (pos, s) of
        -- Error _     -> runnerQ (pos, s)
        Error (ErrorAtPos pos')     -> runnerQ (pos' + pos, s)
        Success res -> Success res

But it led to more strange results:但这导致了更奇怪的结果：

ghci> parseExpr "(5+)-3"
Error (ErrorAtPos 84)
ghci> parseExpr "(5+2-)-3"
Error (ErrorAtPos 372)

Then more doubts were aimed at joinExceptState function of instance Monad (ExceptState es) in spite of everything I've run it through, doubts that it wasn't working on s of (Natural, String) type as I indented in this case.然后更多的疑问是针对joinExceptState function of instance Monad (ExceptState es)尽管我已经运行了它，怀疑它没有像我在这种情况下缩进s那样在(Natural, String)类型上工作。 But then I can't really change it for this concrete type only.但是我真的不能只为这个具体类型改变它。

Answer 1

Excellent question, although it would have been even better if it really included all your code.很好的问题，尽管如果它真的包含您的所有代码会更好。 I filled in the missing pieces:我填写了缺失的部分：

mapAnnotated :: (a -> b) -> Annotated s a -> Annotated s b
mapAnnotated f (a :# e) = (f a) :# e

runP :: Parser a -> String -> Except ParseError a
runP (P (ES {runES = p})) s = case p (0, s) of
  Error e -> Error e
  Success (a :# e) -> Success a

Why is parseExpr "(5+)-3" equal to Error (ErrorAtPos 1) ?为什么parseExpr "(5+)-3"等于Error (ErrorAtPos 1) ？ Here's what happens: we call parseExpr which (ultimately) calls parseTerm which is just pParenth <|> parseVal .下面是发生的事情：我们调用parseExpr ，它（最终）调用parseTerm ，它只是pParenth <|> parseVal 。 pParenth fails, of course, so we look at the definition of <|> to work out what to do.当然， pParenth失败了，所以我们查看<|>的定义来确定要做什么。 That definition says: if the thing on the left fails, try the thing on the right.该定义说：如果左边的事情失败了，就尝试右边的事情。 So we try the thing on the right (ie parseVal ), which also fails, and we report the second error, which is in fact at position 1.所以我们尝试右边的东西（即parseVal ），它也失败了，我们报告了第二个错误，实际上是在 position 1。

To see this more clearly, you can just replace pParenth <|> parseVal with parseVal <|> pParenth and observe that you get ErrorAtPos 2 instead.为了更清楚地看到这一点，您可以将pParenth <|> parseVal替换为parseVal <|> pParenth并观察到您得到ErrorAtPos 2 。

This is almost certainly not the behaviour you want.这几乎肯定不是您想要的行为。 The documentation of Megaparsec's p <|> q , here , says: Megaparsec 的p <|> q文档here说：

If [parser] p fails without consuming any input , parser q is tried.如果 [parser] p 在没有消耗任何输入的情况下失败，则尝试解析器 q。

(emphasis in original, meaning: parser q is not tried in other cases). （原文强调，意思是：parser q 没有在其他情况下尝试）。 This is a more useful thing to do.这是一件更有用的事情。 If you got reasonably far trying to parse a parenthesised expression and then got an error, probably you want to report that error rather than complaining that '(' isn't a digit.如果您在尝试解析带括号的表达式时进行了相当多的尝试，然后遇到错误，您可能想要报告该错误，而不是抱怨“(”不是数字。

Since you say this is an exercise, I'm not going to tell you how to fix the problem.既然你说这是一个练习，我就不会告诉你如何解决这个问题。 I'll tell you some other stuff, though.不过，我会告诉你一些其他的事情。

First, this is not your only issue with error reporting.首先，这不是错误报告的唯一问题。 Above we see that parseVal "(1" reports an error at position 1 ( after the problematic character, which is at position 0) whereas pParenth "(5+)-3" reports an error at position 2 ( before the problematic character, which is at position 3). Ideally, both should give the position of the problematic character itself. (Of course, it'd be even better if the parser stated what character it expected, but that's more difficult to do.)上面我们看到parseVal "(1"在 position 1 报告错误（在有问题的字符之后，在 position 0）而pParenth "(5+)-3"在 position 2（在有问题的字符之前，它is at position 3). 理想情况下，两者都应该给出问题字符本身的 position。（当然，如果解析器说明它期望的字符会更好，但这更难做到。）

Second, the way I found the problem was to import Debug.Trace , replace your definition of pChar with其次，我发现问题的方法是import Debug.Trace ，将您对pChar的定义替换为

pChar :: Parser Char
pChar = P $ ES $ \(pos, s) -> traceShow (pos, s) $
  case s of
    []     -> Error (ErrorAtPos pos)
    (c:cs) -> Success (c :# (pos + 1, cs))

and mull over the output for a bit.仔细考虑一下 output。 Debug.Trace is sometimes less useful than one hopes, because of lazy evaluation, but for a program like this it can help a lot. Debug.Trace 有时不如人们希望的有用，因为惰性求值，但对于像这样的程序它可以提供很多帮助。

Third, if you modify your definition of <|> to match Megaparsec's does, you might need Megaparsec's try combinator.第三，如果您修改<|>的定义以匹配 Megaparsec 的定义，您可能需要 Megaparsec 的try组合器。 (Not for the grammar you're trying to parse now , but maybe later.) try solves the issue that （不适用于您现在尝试解析的语法，但可能会在以后解析。） try解决以下问题

(singleChar 'p' *> singleChar 'q') <|> (singleChar 'p' *> singleChar 'r')

fails on the string "pr" with Megaparsec's <|> .使用 Megaparsec 的<|>在字符串“pr”上失败。

Fourth, you sometimes write someParser <|> parseError , which I think is equivalent to someParser for both your definition of <|> and Megaparsec's.第四，您有时会编写someParser <|> parseError ，我认为对于您对<|>和 Megaparsec 的定义，这等同于someParser 。

Fifth, you don't need void ;第五，你不需要void ； just ignore the result, it's the same thing.只是忽略结果，这是一回事。

Sixth, your Except seems to just be Either .第六，您的Except似乎只是Either 。

来自 Haskell 中解析器组合器的无效异常消息

问题描述

1 个解决方案

解决方案1
2 2022-11-25 08:53:23

来自 Haskell 中解析器组合器的无效异常消息

问题描述

1 个解决方案

解决方案1 2 2022-11-25 08:53:23

解决方案1
2 2022-11-25 08:53:23