简体   繁体   中英

Converting Paulson's parser combinators to Haskell

I am trying to convert the code from Paulson's ML for the working programmer book chapter 9, Writing Interpreters for the λ-Calculus. I was wondering if anyone can help me translate this to Haskell. I'm struggling to understand the syntax.

fun list ph = ph -- repeat ("," $-- ph) >> (op::);

fun pack ph = "(" $-- list ph --$")"       >> #1
              | empty;

In porting this code to Haskell, I see two challenges: One is rewriting the combinators so they use the type Either SyntaxError rather than exceptions for flow control, and the other is preserving the modularity of ML's functors. That is, writing a parser combinator library that is modular with regards to what keywords / symbols / tokenizer it should use.

While the ML code has the two

functor Lexical (Keyword: KEYWORD) : LEXICAL
functor Parsing (Lex: LEXICAL) : PARSE

you could start by having

data Keyword = Keyword
  { alphas :: [String]
  , symbols :: [String]
  }

data Token
  = Key String
  | Id String
  deriving (Show, Eq)

lex :: Keyword -> String -> [Token]
lex kw s = ...
  where
    alphaTok :: String -> Token
    alphaTok a | a `elem` alphas kw = Key a
               | otherwise = Id a
    ...

The ML code uses the types string and substring while Haskell's String is actually a [Char] . The lexer functions would look a little different because ML's String.getc could simply be the pattern match c : ss1 in Haskell, etc.

Paulson's parsers have type [Token] → (τ, [Token]) but allow for exceptions. The Haskell parsers could have type [Token] → Either SyntaxError (τ, [Token]) :

newtype SyntaxError = SyntaxError String
  deriving Show

newtype Parser a = Parser { runParser :: [Token] -> Either SyntaxError (a, [Token]) }

err :: String -> Either SyntaxError b
err msg = Left (SyntaxError msg)

The operators id , $ , || , !! , -- and >> need new names, since they collide with a bunch of built-in operators and single-line comments. Ideas for names could be: ident , kw , ||| , +++ and >>> . I would skip implementing the !! operator initially.

Here are two combinators implemented a little differently,

ident :: Parser String
ident = Parser f
  where
    f :: [Token] -> Either SyntaxError (String, [Token])
    f (Id x : toks) = Right (x, toks)
    f (Key x : _) = err $ "Identifier expected, got keyword '" ++ x ++ "'"
    f [] = err "Identifier expected, got EOF"

    (+++) :: Parser a -> Parser b -> Parser (a, b)
    (+++) pa pb = Parser $ \toks1 -> do (x, toks2) <- runP pa toks1
                                        (y, toks3) <- runP pb toks2
                                        return ((x, y), toks3)
    ...

Some final remarks:

  • Read the paper Monadic Parsing in Haskell (Hutton, Meijer).

  • You may be interested in SimpleParse by Ken Friis Larsen, an educational parser combinator library that is a simplification of ReadP by Koen Claessen, since its source code is very easy to read. They are both non-deterministic.

  • If you're interested in using parser combinators in Haskell, rather than porting some old-fashioned library for the learning experience, I encourage you too look at Megaparsec (tutorial), a modern fork of Parsec. The implementation is a little complex.

  • None of these three libraries (SimpleParse, ReadP, Megaparsec) split lexing and parsing into two separate steps. Rather, they simply build small tokenizing parsers that implicitly eat meaningless whitespace. (See the token combinator in SimpleParse, for example.) However, Megaparsec does allow an arbitrary token type, whether that is Char or some token you have lexed.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM