将BNF翻译成Parsec程序有什么诀窍吗？

Question

The BNF that match function call chain (like x(y)(z)... ): 匹配函数调用链的BNF（如x(y)(z)... ）：

expr = term T
T    = (expr) T
      | EMPTY
term = (expr)
      | VAR

Translate it to Parsec program that looks so tricky. 把它翻译成看起来很棘手的Parsec程序。

term :: Parser Term
term = parens expr <|> var

expr :: Parser Term
expr = do whiteSpace
          e <- term
          maybeAddSuffix e
  where addSuffix e0 = do e1 <- parens expr
                          maybeAddSuffix $ TermApp e0 e1
        maybeAddSuffix e = addSuffix e
                           <|> return e

Could you list all the design patterns about translating BNF to Parsec program? 你能列出关于将BNF翻译成Parsec程序的所有设计模式吗？

Answer 1

The simplest think you could do if your grammar is sizeable is to just use the Alex/Happy combo. 如果您的语法很大，最简单的想法就是使用Alex / Happy组合。 It is fairly straightforward to use, accepts the BNF format directly - no human translation needed - and perhaps most importantly, produces blazingly fast parsers/lexers. 它使用相当简单，直接接受BNF格式 - 不需要人工翻译 - 也许最重要的是，产生极快的解析器/词法分析器。

If you are dead set on doing it with parsec (or you are doing this as a learning exercise), I find it easier in general to do it in two stages; 如果你已经决定使用parsec（或者你正在做这个作为一个学习练习），我发现通常更容易分两个阶段完成它; first lexing, then parsing. 先lexing，然后解析。 Parsec will do both! Parsec会做到这两点！

First write the appropriate types: 首先写出合适的类型：

{-# LANGUAGE LambdaCase #-}

import Text.Parsec 
import Text.Parsec.Combinator 
import Text.Parsec.Prim
import Text.Parsec.Pos
import Text.ParserCombinators.Parsec.Char 
import Control.Applicative hiding ((<|>))
import Control.Monad 

data Term = App Term Term | Var String deriving (Show, Eq)

data Token = LParen | RParen | Str String deriving (Show, Eq)

type Lexer = Parsec [Char] ()   -- A lexer accepts a stream of Char
type Parser = Parsec [Token] () -- A parser accepts a stream of Token

Parsing a single token is simple. 解析单个令牌很简单。 For simplicity, a variable is 1 or more letters. 为简单起见，变量是1个或更多个字母。 You can of course change this however you like. 你当然可以随意改变它。

oneToken :: Lexer Token
oneToken = (char '(' >> return LParen) <|> 
           (char ')' >> return RParen) <|>
           (Str <$> many1 letter)

Parsing the entire token stream is just parsing a single token many times, possible separated by whitespace: 解析整个令牌流只是多次解析单个令牌，可能由空格分隔：

lexer :: Lexer [Token]
lexer = spaces >> many1 (oneToken <* spaces)

Note the placement of spaces : this way, white space is accepted at the beginning and end of the string. 注意spaces的位置：这样，在字符串的开头和结尾都接受空格。

Since Parser uses a custom token type, you have to use a custom satisfy function. 由于Parser使用自定义令牌类型，因此您必须使用自定义satisfy功能。 Fortunately, this is almost identical to the existing satisfy. 幸运的是，这几乎与现有的满足相同。

satisfy' :: (Token -> Bool) -> Parser Token
satisfy' f = tokenPrim show 
                       (\src _ _ -> incSourceColumn src 1) 
                       (\x -> if f x then Just x else Nothing)

Then we can write parsers for each of the primitive tokens. 然后我们可以为每个原始令牌编写解析器。

lparen = satisfy' $ \case { LParen -> True ; _ -> False } 
rparen = satisfy' $ \case { RParen -> True ; _ -> False } 
strTok = (\(Str s) -> s) <$> (satisfy' $ \case { Str {} -> True ; _ -> False })

As you may imagine, parens would be useful for our purposes. 你可以想象， parens对我们的目的是有用的。 It is very straightforward to write. 写作非常简单。

parens :: Parser a -> Parser a 
parens = between lparen rparen

Now the interesting parts. 现在有趣的部分。

term, expr, var :: Parser Term

term = parens expr <|> var

var = Var <$> strTok

These two should be fairly obvious to you. 这两个对你来说应该是相当明显的。

Parec contains combinators option and optionMaybe which are useful when you you need to "maybe do something". Parec包含组合器option和optionMaybe ，当您需要“可能做某事”时，它们非常有用。

expr = do 
  e0 <- term 
  option e0 (parens expr >>= \e1 -> return (App e0 e1))

The last line means - try to apply the parser given to option - if it fails, instead return e0 . 最后一行表示 - 尝试将给定的解析器应用于option - 如果失败，则返回e0 。

For testing you can do: 对于测试，您可以：

tokAndParse = runParser (lexer <* eof) () "" >=> runParser (expr <* eof) () ""

The eof attached to each parser is to make sure that the entire input is consumed; 附加到每个解析器的eof是为了确保消耗整个输入; the string cannot be a member of the grammar if there are extra trailing characters. 如果有额外的尾随字符，则字符串不能是语法的成员。 Note - your example x(y)(z) is not actually in your grammar! 注意 - 您的示例x(y)(z)实际上并不在您的语法中！

>tokAndParse "x(y)(z)"
Left (line 1, column 5):
unexpected LParen
expecting end of input

But the following is 但以下是

>tokAndParse "(x(y))(z)"
Right (App (App (Var "x") (Var "y")) (Var "z"))

将BNF翻译成Parsec程序有什么诀窍吗？

问题描述

1 个解决方案

解决方案1
3 2015-03-03 17:18:40

将BNF翻译成Parsec程序有什么诀窍吗？

问题描述

1 个解决方案

解决方案1 3 2015-03-03 17:18:40

解决方案1
3 2015-03-03 17:18:40