简体   繁体   English

将BNF翻译成Parsec程序有什么诀窍吗?

[英]Is there any trick about translating BNF to Parsec program?

The BNF that match function call chain (like x(y)(z)... ): 匹配函数调用链的BNF(如x(y)(z)... ):

expr = term T
T    = (expr) T
      | EMPTY
term = (expr)
      | VAR 

Translate it to Parsec program that looks so tricky. 把它翻译成看起来很棘手的Parsec程序。

term :: Parser Term
term = parens expr <|> var

expr :: Parser Term
expr = do whiteSpace
          e <- term
          maybeAddSuffix e
  where addSuffix e0 = do e1 <- parens expr
                          maybeAddSuffix $ TermApp e0 e1
        maybeAddSuffix e = addSuffix e
                           <|> return e

Could you list all the design patterns about translating BNF to Parsec program? 你能列出关于将BNF翻译成Parsec程序的所有设计模式吗?

The simplest think you could do if your grammar is sizeable is to just use the Alex/Happy combo. 如果您的语法很大,最简单的想法就是使用Alex / Happy组合。 It is fairly straightforward to use, accepts the BNF format directly - no human translation needed - and perhaps most importantly, produces blazingly fast parsers/lexers. 它使用相当简单,直接接受BNF格式 - 不需要人工翻译 - 也许最重要的是,产生极快的解析器/词法分析器。

If you are dead set on doing it with parsec (or you are doing this as a learning exercise), I find it easier in general to do it in two stages; 如果你已经决定使用parsec(或者你正在做这个作为一个学习练习),我发现通常更容易分两个阶段完成它; first lexing, then parsing. 先lexing,然后解析。 Parsec will do both! Parsec会做到这两点!

First write the appropriate types: 首先写出合适的类型:

{-# LANGUAGE LambdaCase #-}

import Text.Parsec 
import Text.Parsec.Combinator 
import Text.Parsec.Prim
import Text.Parsec.Pos
import Text.ParserCombinators.Parsec.Char 
import Control.Applicative hiding ((<|>))
import Control.Monad 

data Term = App Term Term | Var String deriving (Show, Eq)

data Token = LParen | RParen | Str String deriving (Show, Eq)

type Lexer = Parsec [Char] ()   -- A lexer accepts a stream of Char
type Parser = Parsec [Token] () -- A parser accepts a stream of Token

Parsing a single token is simple. 解析单个令牌很简单。 For simplicity, a variable is 1 or more letters. 为简单起见,变量是1个或更多个字母。 You can of course change this however you like. 你当然可以随意改变它。

oneToken :: Lexer Token
oneToken = (char '(' >> return LParen) <|> 
           (char ')' >> return RParen) <|>
           (Str <$> many1 letter)

Parsing the entire token stream is just parsing a single token many times, possible separated by whitespace: 解析整个令牌流只是多次解析单个令牌,可能由空格分隔:

lexer :: Lexer [Token]
lexer = spaces >> many1 (oneToken <* spaces) 

Note the placement of spaces : this way, white space is accepted at the beginning and end of the string. 注意spaces的位置:这样,在字符串的开头和结尾都接受空格。

Since Parser uses a custom token type, you have to use a custom satisfy function. 由于Parser使用自定义令牌类型,因此您必须使用自定义satisfy功能。 Fortunately, this is almost identical to the existing satisfy. 幸运的是,这几乎与现有的满足相同。

satisfy' :: (Token -> Bool) -> Parser Token
satisfy' f = tokenPrim show 
                       (\src _ _ -> incSourceColumn src 1) 
                       (\x -> if f x then Just x else Nothing)

Then we can write parsers for each of the primitive tokens. 然后我们可以为每个原始令牌编写解析器。

lparen = satisfy' $ \case { LParen -> True ; _ -> False } 
rparen = satisfy' $ \case { RParen -> True ; _ -> False } 
strTok = (\(Str s) -> s) <$> (satisfy' $ \case { Str {} -> True ; _ -> False })

As you may imagine, parens would be useful for our purposes. 你可以想象, parens对我们的目的是有用的。 It is very straightforward to write. 写作非常简单。

parens :: Parser a -> Parser a 
parens = between lparen rparen 

Now the interesting parts. 现在有趣的部分。

term, expr, var :: Parser Term

term = parens expr <|> var

var = Var <$> strTok 

These two should be fairly obvious to you. 这两个对你来说应该是相当明显的。

Parec contains combinators option and optionMaybe which are useful when you you need to "maybe do something". Parec包含组合器optionoptionMaybe ,当您需要“可能做某事”时,它们非常有用。

expr = do 
  e0 <- term 
  option e0 (parens expr >>= \e1 -> return (App e0 e1))

The last line means - try to apply the parser given to option - if it fails, instead return e0 . 最后一行表示 - 尝试将给定的解析器应用于option - 如果失败,则返回e0

For testing you can do: 对于测试,您可以:

tokAndParse = runParser (lexer <* eof) () "" >=> runParser (expr <* eof) () ""

The eof attached to each parser is to make sure that the entire input is consumed; 附加到每个解析器的eof是为了确保消耗整个输入; the string cannot be a member of the grammar if there are extra trailing characters. 如果有额外的尾随字符,则字符串不能是语法的成员。 Note - your example x(y)(z) is not actually in your grammar! 注意 - 您的示例x(y)(z)实际上并不在您的语法中!

>tokAndParse "x(y)(z)"
Left (line 1, column 5):
unexpected LParen
expecting end of input

But the following is 但以下是

>tokAndParse "(x(y))(z)"
Right (App (App (Var "x") (Var "y")) (Var "z"))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM