简体   繁体   English

如何在Parsec的monadic上下文中返回多个解析失败?

[英]How to return multiple parse failures within Parsec's monadic context?

I have a grammar I am parsing which consists of exactly two required and unique logical parts, Alpha and Beta . 我正在解析一个语法,该语法恰好两个必需且唯一的逻辑部分AlphaBeta These parts can be defined in any order, Alpha before Beta or visa-vera. 这些部分可以按任何顺序定义,即Beta或Visa-vera之前的Alpha I would like to provide robust error messages for the less tech-savvy users. 我想为技术不太熟练的用户提供可靠的错误消息。

In the example below there are cases where multiple parse failures exist. 在下面的示例中,存在多个解析失败的情况。 I concatenate the failure message String s with the unlines function and pass the resulting concatenation into the fail combinator. 我将失败消息Stringunlines函数连接起来,并将所产生的串联传递给fail组合器。 This creates a ParseError value with a single Message value when parse is called on grammarDefinition . 当在grammarDefinition上调用parse时,这将创建一个具有单个 Message值的ParseError值。

Example Scenario: 示例场景:

import Data.Either                   (partitionEithers)
import Data.Set                      (Set)
import Text.Parsec                   (Parsec)
import Text.Parsec.Char
import Text.ParserCombinators.Parsec

data Result = Result Alpha Beta
type Alpha  = Set (Int,Float)
type Beta   = Set String

grammarDefinition :: Parsec String u Result
grammarDefinition = do
    segments <- partitionEithers <$> many segment
    _        <- eof
    case segments of
      (     [],      []) -> fail $ unlines [missingAlpha, missingBeta]
      (      _,      []) -> fail $ missingBeta
      (     [],       _) -> fail $ missingAlpha
      ((_:_:_), (_:_:_)) -> fail $ unlines [multipleAlpha, multipleBeta]
      (      _, (_:_:_)) -> fail $ multipleBeta
      ((_:_:_),       _) -> fail $ multipleAlpha
      (    [x],     [y]) -> pure $ Result x y
    where
      missingAlpha     = message "No" "alpha"
      missingBeta      = message "No" "beta"
      multipleAlpha    = message "Multiple" "alpha"
      multipleBeta     = message "Multiple" "beta"
      message x y      = concat [x," ",y," defined in input, ","exactly one ",y," definition required"]

-- Type signature is important!
segment :: Parsec String u (Either Alpha Beta)
segment = undefined -- implementation irrelevant

I would like the ParseError to contain multiple Message values in the case of multiple failures. 在多个失败的情况下,我希望ParseError包含多个 Message值。 This should be possible due to the existence of the addErrorMessage function. 由于存在addErrorMessage函数,因此这应该可行。 I am not sure hw to supply multiple failure within the Parsec monadic context, before the result is materialized by calling parse . 我不确定在调用parse实现结果之前,如何在Parsec单子上下文中提供多个失败。

Example Function: 示例功能:

fails :: [String] -> ParsecT s u m a
fails = undefined -- Not sure how to define this!

How do I supply multiple 我如何提供多个 Message values to the 的价值观 ParseError result within Parsec's monadic context? Parsec的单子语境中得出结果?

fail in this case is equivalent to parserFail defined in Text.Parsec.Prim : 在这种情况下, fail等效于parserFail定义的Text.Parsec.Prim

parserFail :: String -> ParsecT s u m a
parserFail msg
    = ParsecT $ \s _ _ _ eerr ->
      eerr $ newErrorMessage (Message msg) (statePos s)

Since newErrorMessage and addErrorMessage both create a ParseError , this variation of parserFail should also work: 由于newErrorMessageaddErrorMessage都创造ParseError ,这种变化parserFail也应努力:

parserFail' :: String -> ParsecT s u m a
parserFail' msg
    = ParsecT $ \s _ _ _ eerr ->
      eerr $ theMessages s
where
  theMessages s =
    addErrorMessage (Message "blah") $
      addErrorMessage (Expect "expected this") $
        newErrorMessage (Message msg) (statePos s)

which should push 3 messages onto the error message list. 它将3条消息推送到错误消息列表中。

Also in that module, have a look at label and labels which is the only place where addErrorMessage is used. 同样在该模块中,查看labellabels ,这是唯一使用addErrorMessage地方。 labels is just a multi-message version of the <?> operator. labels只是<?>运算符的多消息版本。 Note how it uses foldr to build up a compound error message: 请注意,它如何使用文件foldr构建复合错误消息:

labels :: ParsecT s u m a -> [String] -> ParsecT s u m a
labels p msgs =
    ParsecT $ \s cok cerr eok eerr ->
    let eok' x s' error = eok x s' $ if errorIsUnknown error
                  then error
                  else setExpectErrors error msgs
        eerr' err = eerr $ setExpectErrors err msgs

    in unParser p s cok cerr eok' eerr'

 where
   setExpectErrors err []         = setErrorMessage (Expect "") err
   setExpectErrors err [msg]      = setErrorMessage (Expect msg) err
   setExpectErrors err (msg:msgs)
       = foldr (\msg' err' -> addErrorMessage (Expect msg') err')
         (setErrorMessage (Expect msg) err) msgs

The only gatcha is that you need access to the ParsecT constructor which is not exported by Text.Parsec.Prim . 唯一的问题是您需要访问不是由Text.Parsec.Prim导出的ParsecT构造Text.Parsec.Prim Maybe you can find a way to use labels or another way around that problem. 也许您可以找到一种使用labels或解决该问题的另一种方法。 Otherwise you could always include your own hacked version of parsec with your code. 否则,您始终可以在代码中包含自己的parsec hacked版本。

We can leverage the fact that ParsecT is an instance of MonadPlus to combine the definition of mzero with the function labels to derive the desired result: 我们可以利用的事实, ParsecT是一个实例MonadPlus来定义结合mzero与功能labels以获得期望的结果:

fails :: [String] -> ParsecT s u m a
fails = labels mzero

Note: The ParseError has many Expect values, not many Message values... 注意: ParseError具有许多Expect值,而没有许多Message值...

I would recommend transitioning from Parsec to newer and more extensible Megaparsec library. 我建议从Parsec过渡到更新和可扩展的Megaparsec库。

This exact issue has been resolved since version 4.2.0.0 . 4.2.0.0版本4.2.0.0 此确切问题已得到解决。

Multiple parse error Message s can easily be created with the following function: 可以使用以下函数轻松创建多个解析错误Message

fails :: MonadParsec m => [String] -> m a
fails = failure . fmap Message

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM