简体   繁体   中英

Is there any trick about translating BNF to Parsec program?

The BNF that match function call chain (like x(y)(z)... ):

expr = term T
T    = (expr) T
      | EMPTY
term = (expr)
      | VAR 

Translate it to Parsec program that looks so tricky.

term :: Parser Term
term = parens expr <|> var

expr :: Parser Term
expr = do whiteSpace
          e <- term
          maybeAddSuffix e
  where addSuffix e0 = do e1 <- parens expr
                          maybeAddSuffix $ TermApp e0 e1
        maybeAddSuffix e = addSuffix e
                           <|> return e

Could you list all the design patterns about translating BNF to Parsec program?

The simplest think you could do if your grammar is sizeable is to just use the Alex/Happy combo. It is fairly straightforward to use, accepts the BNF format directly - no human translation needed - and perhaps most importantly, produces blazingly fast parsers/lexers.

If you are dead set on doing it with parsec (or you are doing this as a learning exercise), I find it easier in general to do it in two stages; first lexing, then parsing. Parsec will do both!

First write the appropriate types:

{-# LANGUAGE LambdaCase #-}

import Text.Parsec 
import Text.Parsec.Combinator 
import Text.Parsec.Prim
import Text.Parsec.Pos
import Text.ParserCombinators.Parsec.Char 
import Control.Applicative hiding ((<|>))
import Control.Monad 

data Term = App Term Term | Var String deriving (Show, Eq)

data Token = LParen | RParen | Str String deriving (Show, Eq)

type Lexer = Parsec [Char] ()   -- A lexer accepts a stream of Char
type Parser = Parsec [Token] () -- A parser accepts a stream of Token

Parsing a single token is simple. For simplicity, a variable is 1 or more letters. You can of course change this however you like.

oneToken :: Lexer Token
oneToken = (char '(' >> return LParen) <|> 
           (char ')' >> return RParen) <|>
           (Str <$> many1 letter)

Parsing the entire token stream is just parsing a single token many times, possible separated by whitespace:

lexer :: Lexer [Token]
lexer = spaces >> many1 (oneToken <* spaces) 

Note the placement of spaces : this way, white space is accepted at the beginning and end of the string.

Since Parser uses a custom token type, you have to use a custom satisfy function. Fortunately, this is almost identical to the existing satisfy.

satisfy' :: (Token -> Bool) -> Parser Token
satisfy' f = tokenPrim show 
                       (\src _ _ -> incSourceColumn src 1) 
                       (\x -> if f x then Just x else Nothing)

Then we can write parsers for each of the primitive tokens.

lparen = satisfy' $ \case { LParen -> True ; _ -> False } 
rparen = satisfy' $ \case { RParen -> True ; _ -> False } 
strTok = (\(Str s) -> s) <$> (satisfy' $ \case { Str {} -> True ; _ -> False })

As you may imagine, parens would be useful for our purposes. It is very straightforward to write.

parens :: Parser a -> Parser a 
parens = between lparen rparen 

Now the interesting parts.

term, expr, var :: Parser Term

term = parens expr <|> var

var = Var <$> strTok 

These two should be fairly obvious to you.

Parec contains combinators option and optionMaybe which are useful when you you need to "maybe do something".

expr = do 
  e0 <- term 
  option e0 (parens expr >>= \e1 -> return (App e0 e1))

The last line means - try to apply the parser given to option - if it fails, instead return e0 .

For testing you can do:

tokAndParse = runParser (lexer <* eof) () "" >=> runParser (expr <* eof) () ""

The eof attached to each parser is to make sure that the entire input is consumed; the string cannot be a member of the grammar if there are extra trailing characters. Note - your example x(y)(z) is not actually in your grammar!

>tokAndParse "x(y)(z)"
Left (line 1, column 5):
unexpected LParen
expecting end of input

But the following is

>tokAndParse "(x(y))(z)"
Right (App (App (Var "x") (Var "y")) (Var "z"))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM