简体   繁体   English

使用Parsec解析命令式语言的奇怪行为

[英]Strange behaviour parsing an imperative language using Parsec

I'm trying to parse a fragment of the Abap language with Parsec in haskell. 我正在尝试用haskell中的Parsec解析Abap语言的片段。 The statements in Abap are delimited by dots. Abap中的语句由点分隔。 The syntax for function definition is: 函数定义的语法是:

FORM <name> <arguments>.
    <statements>.
ENDFORM.

I will use it as a minimal example. 我会用它作为一个最小的例子。 Here is my attempt at writing the corresponding type in haskell and the parser. 这是我尝试在haskell和解析器中编写相应的类型。 The GenStatement -Constructor is for all other statements except function definition as described above. GenStatement -Constructor用于除上述函数定义之外的所有其他语句。

module Main where

import Control.Applicative
import Data.Functor.Identity

import qualified Text.Parsec as P
import qualified Text.Parsec.String as S
import Text.Parsec.Language
import qualified Text.Parsec.Token as T

type Args = String
type Name = String

data AbapExpr -- ABAP Program
   = Form Name Args [AbapExpr]
   | GenStatement String [AbapExpr]
   deriving (Show, Read)

lexer :: T.TokenParser ()
lexer = T.makeTokenParser style
  where
    caseSensitive = False
    keys = ["form", "endform"]
    style = emptyDef
        { T.reservedNames = keys
        , T.identStart = P.alphaNum <|> P.char '_'
        , T.identLetter = P.alphaNum <|> P.char '_'
        }

dot :: S.Parser String
dot = T.dot lexer

reserved :: String -> S.Parser ()
reserved = T.reserved lexer

identifier :: S.Parser String
identifier = T.identifier lexer

argsP :: S.Parser String
argsP = P.manyTill P.anyChar (P.try (P.lookAhead dot))

genericStatementP :: S.Parser String
genericStatementP = P.manyTill P.anyChar (P.try dot)

abapExprP = P.try (P.between (reserved "form")
                             (reserved "endform" >> dot)
                             abapFormP)
    <|> abapStmtP
  where
    abapFormP = Form <$> identifier <*> argsP <* dot <*> many abapExprP
    abapStmtP = GenStatement <$> genericStatementP <*> many abapExprP

Testing the parser with the following input results in a strange behaviour. 使用以下输入测试解析器会导致奇怪的行为。

-- a wrapper for convenience
parse :: S.Parser a -> String -> Either P.ParseError a
parse = flip P.parse "Test"

testParse1 = parse abapExprP "form foo arg1 arg2 arg2. form bar arg1. endform. endform."

results in 结果是

Right (GenStatement "form foo arg1 arg2 arg2" [GenStatement "form bar arg1" [GenStatement "endform" [GenStatement "endform" []]]])

so it seems the first brach always fails and only the second generic branch is successful. 所以似乎第一个brach总是失败,只有第二个通用分支成功。 However if the second branch (parsing generic statements) is commented parsing forms suddenly succeeds: 但是,如果第二个分支(解析通用语句)被注释,解析表单突然成功:

abapExprP = P.try (P.between (reserved "form")
                             (reserved "endform" >> dot)
                             abapFormP)
    --    <|> abapStmtP
  where
    abapFormP = Form <$> identifier <*> argsP <* dot <*> many abapExprP
    -- abapStmtP = GenStatement <$> genericStatementP <*> many abapExprP

Now we get 现在我们得到了

 Right (Form "foo" "arg1 arg2 arg2" [Form "bar" "arg1" []])

How is this possible? 这怎么可能? It seems that the first branch succeeds so why doesn't it work in the first example - what am I missing? 似乎第一个分支成功了,为什么它不能在第一个例子中起作用 - 我缺少什么?

Many thanks in advance! 提前谢谢了!

Looks for me that your parser genericStatementP parses any character until a dot appears (you are using P.anyChar ). 找我,你的解析器genericStatementP解析任何字符,直到出现一个点(你正在使用P.anyChar )。 Hence it doesn't recognize the reserved keywords for your lexer. 因此,它无法识别词法分析器的保留关键字。

I think you must define: 我想你必须定义:

type Args = [String]

and: 和:

argsP :: S.Parser [String]
argsP = P.manyTill identifier (P.try (P.lookAhead dot))

genericStatementP :: S.Parser String
genericStatementP = identifier

With these changes I get the following result: 通过这些更改,我得到以下结果:

Right (Form "foo" ["arg1","arg2","arg2"] [Form "bar" ["arg1"] []])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM