[英]Invalid exception messages from parser combinators in Haskell
I'm studying functional programming using Haskell language.我正在使用 Haskell 语言学习函数式编程。 And as an exercise I need to implement a function parsing a primitive arithmetic expression from String
.作为练习,我需要实现一个 function 从String
解析原始算术表达式。 The function must be able to handle double literals, operations +
, -
, *
, /
with the usual precedence and parentheses. function 必须能够处理双文字、操作+
、 -
、 *
、 /
以及通常的优先级和括号。
parseExpr :: String -> Except ParseError Expr
with next defined data types:使用下一个定义的数据类型:
data ParseError = ErrorAtPos Natural
deriving Show
newtype Parser a = P (ExceptState ParseError (Natural, String) a)
deriving newtype (Functor, Applicative, Monad)
data Prim a
= Add a a
| Sub a a
| Mul a a
| Div a a
| Abs a
| Sgn a
deriving Show
data Expr
= Val Double
| Op (Prim Expr)
deriving Show
Where ExceptState
is a modified State
monad, allowing to throw exception pointing at the error position.其中ExceptState
是修改后的State
monad,允许抛出指向错误 position 的异常。
data Annotated e a = a :# e
deriving Show
infix 0 :#
data Except e a = Error e | Success a
deriving Show
data ExceptState e s a = ES { runES :: s -> Except e (Annotated s a) }
Also ExceptState
has defined Functor
, Applicative
and Monad
instances, which were thoroughly tested earlier, so I am positive in their correctness. ExceptState
还定义了Functor
、 Applicative
和Monad
实例,这些实例之前已经过全面测试,所以我对它们的正确性持肯定态度。
instance Functor (ExceptState e s) where
fmap func ES{runES = runner} = ES{runES = \s ->
case (runner s) of
Error err -> Error err
Success ans -> Success (mapAnnotated func $ ans) }
instance Applicative (ExceptState e s) where
pure arg = ES{runES = \s -> Success (arg :# s)}
p <*> q = Control.Monad.ap p q
instance Monad (ExceptState e s) where
m >>= f = joinExceptState (fmap f m)
where
joinExceptState :: ExceptState e s (ExceptState e s a) -> ExceptState e s a
joinExceptState ES{runES = runner} = ES{runES = \s ->
case (runner s) of
Error err -> Error err
Success (ES{runES = runner2} :# s2) ->
case (runner2 s2) of
Error err -> Error err
Success (res :# s3) -> Success (res :# s3) }
To implement the function parseExpr
I used basic parser combinators:为了实现 function parseExpr
,我使用了基本的解析器组合器:
pChar :: Parser Char
pChar = P $ ES $ \(pos, s) ->
case s of
[] -> Error (ErrorAtPos pos)
(c:cs) -> Success (c :# (pos + 1, cs))
parseError :: Parser a
parseError = P $ ES $ \(pos, _) -> Error (ErrorAtPos pos)
instance Alternative Parser where
empty = parseError
(<|>) (P(ES{runES = runnerP})) (P(ES{runES = runnerQ})) =
P $ ES $ \(pos, s) ->
case runnerP (pos, s) of
Error _ -> runnerQ (pos, s)
Success res -> Success res
instance MonadPlus Parser
which were used to construct more complex ones:用于构建更复杂的:
-- | elementary parser not consuming a character, failing if input doesn't
-- reach its end
pEof :: Parser ()
pEof = P $ ES $ \(pos, s) ->
case s of
[] -> Success (() :# (pos, []))
_ -> Error $ ErrorAtPos pos
-- | parses a single digit value
parseVal :: Parser Expr
parseVal = Val <$> (fromIntegral . digitToInt) <$> mfilter isDigit pChar
-- | parses an expression inside parenthises
pParenth :: Parser Expr
pParenth = do
void $ mfilter (== '(') pChar
expr <- parseAddSub
(void $ mfilter (== ')') pChar) <|> parseError
return expr
-- | parses the most prioritised operations
parseTerm :: Parser Expr
parseTerm = pParenth <|> parseVal
parseAddSub :: Parser Expr
parseAddSub = do
x <- parseTerm
ys <- many parseSecond
return $ foldl (\acc (sgn, y) -> Op $
(if sgn == '+' then Add else Sub) acc y) x ys
where
parseSecond :: Parser (Char, Expr)
parseSecond = do
sgn <- mfilter ((flip elem) "+-") pChar
y <- parseTerm <|> parseError
return (sgn, y)
-- | Parses the whole expression. Begins from parsing on +, - level and
-- successfully consuming the whole string.
pExpr :: Parser Expr
pExpr = do
expr <- parseAddSub
pEof
return expr
-- | More convinient way to run 'pExpr' parser
parseExpr :: String -> Except ParseError Expr
parseExpr = runP pExpr
As a result, at this point function works as intended if given String
expression is valid:因此,如果给定的String
表达式有效,此时 function 将按预期工作:
ghci> parseExpr "(2+3)-1"
Success (Op (Sub (Op (Add (Val 2.0) (Val 3.0))) (Val 1.0)))
ghci> parseExpr "(2+3-1)-1"
Success (Op (Sub (Op (Sub (Op (Add (Val 2.0) (Val 3.0))) (Val 1.0))) (Val 1.0)))
Otherwise ErrorAtPos
does not point at the necessary position:否则ErrorAtPos
不会指向必要的 position:
ghci> parseExpr "(2+)-1"
Error (ErrorAtPos 1)
ghci> parseExpr "(2+3-)-1"
Error (ErrorAtPos 1)
What am I doing wrong here?我在这里做错了什么? Thank you in advance.先感谢您。
My main assumption was that something wrong was with function (<|>)
of Alternative Parser
and it incorrectly changed pos
variable.我的主要假设是Alternative Parser
的 function (<|>)
出了点问题,它错误地更改了pos
变量。
(<|>) (P(ES{runES = runnerP})) (P(ES{runES = runnerQ})) =
P $ ES $ \(pos, s) ->
case runnerP (pos, s) of
-- Error _ -> runnerQ (pos, s)
Error (ErrorAtPos pos') -> runnerQ (pos' + pos, s)
Success res -> Success res
But it led to more strange results:但这导致了更奇怪的结果:
ghci> parseExpr "(5+)-3"
Error (ErrorAtPos 84)
ghci> parseExpr "(5+2-)-3"
Error (ErrorAtPos 372)
Then more doubts were aimed at joinExceptState
function of instance Monad (ExceptState es)
in spite of everything I've run it through, doubts that it wasn't working on s
of (Natural, String)
type as I indented in this case.然后更多的疑问是针对joinExceptState
function of instance Monad (ExceptState es)
尽管我已经运行了它,怀疑它没有像我在这种情况下缩进s
那样在(Natural, String)
类型上工作。 But then I can't really change it for this concrete type only.但是我真的不能只为这个具体类型改变它。
Excellent question, although it would have been even better if it really included all your code.很好的问题,尽管如果它真的包含您的所有代码会更好。 I filled in the missing pieces:我填写了缺失的部分:
mapAnnotated :: (a -> b) -> Annotated s a -> Annotated s b
mapAnnotated f (a :# e) = (f a) :# e
runP :: Parser a -> String -> Except ParseError a
runP (P (ES {runES = p})) s = case p (0, s) of
Error e -> Error e
Success (a :# e) -> Success a
Why is parseExpr "(5+)-3"
equal to Error (ErrorAtPos 1)
?为什么parseExpr "(5+)-3"
等于Error (ErrorAtPos 1)
? Here's what happens: we call parseExpr
which (ultimately) calls parseTerm
which is just pParenth <|> parseVal
.下面是发生的事情:我们调用parseExpr
,它(最终)调用parseTerm
,它只是pParenth <|> parseVal
。 pParenth
fails, of course, so we look at the definition of <|>
to work out what to do.当然, pParenth
失败了,所以我们查看<|>
的定义来确定要做什么。 That definition says: if the thing on the left fails, try the thing on the right.该定义说:如果左边的事情失败了,就尝试右边的事情。 So we try the thing on the right (ie parseVal
), which also fails, and we report the second error, which is in fact at position 1.所以我们尝试右边的东西(即parseVal
),它也失败了,我们报告了第二个错误,实际上是在 position 1。
To see this more clearly, you can just replace pParenth <|> parseVal
with parseVal <|> pParenth
and observe that you get ErrorAtPos 2
instead.为了更清楚地看到这一点,您可以将pParenth <|> parseVal
替换为parseVal <|> pParenth
并观察到您得到ErrorAtPos 2
。
This is almost certainly not the behaviour you want.这几乎肯定不是您想要的行为。 The documentation of Megaparsec's p <|> q
, here , says: Megaparsec 的p <|> q
文档here说:
If [parser] p fails without consuming any input , parser q is tried.如果 [parser] p 在没有消耗任何输入的情况下失败,则尝试解析器 q。
(emphasis in original, meaning: parser q is not tried in other cases). (原文强调,意思是:parser q 没有在其他情况下尝试)。 This is a more useful thing to do.这是一件更有用的事情。 If you got reasonably far trying to parse a parenthesised expression and then got an error, probably you want to report that error rather than complaining that '(' isn't a digit.如果您在尝试解析带括号的表达式时进行了相当多的尝试,然后遇到错误,您可能想要报告该错误,而不是抱怨“(”不是数字。
Since you say this is an exercise, I'm not going to tell you how to fix the problem.既然你说这是一个练习,我就不会告诉你如何解决这个问题。 I'll tell you some other stuff, though.不过,我会告诉你一些其他的事情。
First, this is not your only issue with error reporting.首先,这不是错误报告的唯一问题。 Above we see that parseVal "(1"
reports an error at position 1 ( after the problematic character, which is at position 0) whereas pParenth "(5+)-3"
reports an error at position 2 ( before the problematic character, which is at position 3). Ideally, both should give the position of the problematic character itself. (Of course, it'd be even better if the parser stated what character it expected, but that's more difficult to do.)上面我们看到parseVal "(1"
在 position 1 报告错误(在有问题的字符之后,在 position 0)而pParenth "(5+)-3"
在 position 2(在有问题的字符之前,它is at position 3). 理想情况下,两者都应该给出问题字符本身的 position。(当然,如果解析器说明它期望的字符会更好,但这更难做到。)
Second, the way I found the problem was to import Debug.Trace
, replace your definition of pChar
with其次,我发现问题的方法是import Debug.Trace
,将您对pChar
的定义替换为
pChar :: Parser Char
pChar = P $ ES $ \(pos, s) -> traceShow (pos, s) $
case s of
[] -> Error (ErrorAtPos pos)
(c:cs) -> Success (c :# (pos + 1, cs))
and mull over the output for a bit.仔细考虑一下 output。 Debug.Trace is sometimes less useful than one hopes, because of lazy evaluation, but for a program like this it can help a lot. Debug.Trace 有时不如人们希望的有用,因为惰性求值,但对于像这样的程序它可以提供很多帮助。
Third, if you modify your definition of <|>
to match Megaparsec's does, you might need Megaparsec's try
combinator.第三,如果您修改<|>
的定义以匹配 Megaparsec 的定义,您可能需要 Megaparsec 的try
组合器。 (Not for the grammar you're trying to parse now , but maybe later.) try
solves the issue that (不适用于您现在尝试解析的语法,但可能会在以后解析。) try
解决以下问题
(singleChar 'p' *> singleChar 'q') <|> (singleChar 'p' *> singleChar 'r')
fails on the string "pr" with Megaparsec's <|>
.使用 Megaparsec 的<|>
在字符串“pr”上失败。
Fourth, you sometimes write someParser <|> parseError
, which I think is equivalent to someParser
for both your definition of <|>
and Megaparsec's.第四,您有时会编写someParser <|> parseError
,我认为对于您对<|>
和 Megaparsec 的定义,这等同于someParser
。
Fifth, you don't need void
;第五,你不需要void
; just ignore the result, it's the same thing.只是忽略结果,这是一回事。
Sixth, your Except
seems to just be Either
.第六,您的Except
似乎只是Either
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.