简体   繁体   English

将 Paulson 的解析器组合器转换为 Haskell

[英]Converting Paulson's parser combinators to Haskell

I am trying to convert the code from Paulson's ML for the working programmer book chapter 9, Writing Interpreters for the λ-Calculus.我正在尝试将 Paulson 的 ML 中的代码转换为工作程序员书籍第 9 章,为λ-微积分编写解释器。 I was wondering if anyone can help me translate this to Haskell.我想知道是否有人可以帮我把它翻译成 Haskell。 I'm struggling to understand the syntax.我正在努力理解语法。

fun list ph = ph -- repeat ("," $-- ph) >> (op::);

fun pack ph = "(" $-- list ph --$")"       >> #1
              | empty;

In porting this code to Haskell, I see two challenges: One is rewriting the combinators so they use the type Either SyntaxError rather than exceptions for flow control, and the other is preserving the modularity of ML's functors.在将这段代码移植到 Haskell 时,我看到了两个挑战:一个是重写组合子,使它们使用类型nothing SyntaxError而不是异常进行流控制,另一个是保留 ML 函子的模块化。 That is, writing a parser combinator library that is modular with regards to what keywords / symbols / tokenizer it should use.也就是说,编写一个解析器组合器库,该库对于它应该使用的关键字/符号/标记器是模块化的。

While the ML code has the two虽然 ML 代码有两个

functor Lexical (Keyword: KEYWORD) : LEXICAL
functor Parsing (Lex: LEXICAL) : PARSE

you could start by having你可以从拥有

data Keyword = Keyword
  { alphas :: [String]
  , symbols :: [String]
  }

data Token
  = Key String
  | Id String
  deriving (Show, Eq)

lex :: Keyword -> String -> [Token]
lex kw s = ...
  where
    alphaTok :: String -> Token
    alphaTok a | a `elem` alphas kw = Key a
               | otherwise = Id a
    ...

The ML code uses the types string and substring while Haskell's String is actually a [Char] . ML 代码使用类型stringsubstring而 Haskell 的String实际上是一个[Char] The lexer functions would look a little different because ML's String.getc could simply be the pattern match c : ss1 in Haskell, etc.词法分析器函数看起来有点不同,因为 ML 的String.getc可能只是模式匹配c : ss1 Haskell 中的c : ss1等。

Paulson's parsers have type [Token] → (τ, [Token]) but allow for exceptions. Paulson 的解析器具有[Token] → (τ, [Token]) 类型,但允许例外。 The Haskell parsers could have type [Token] → Either SyntaxError (τ, [Token]) : Haskell 解析器的类型可以是[Token] →Either SyntaxError (τ, [Token])

newtype SyntaxError = SyntaxError String
  deriving Show

newtype Parser a = Parser { runParser :: [Token] -> Either SyntaxError (a, [Token]) }

err :: String -> Either SyntaxError b
err msg = Left (SyntaxError msg)

The operators id , $ , ||操作符id , $ , || , !! !! , -- and >> need new names, since they collide with a bunch of built-in operators and single-line comments. -->>需要新名称,因为它们与一堆内置运算符和单行注释相冲突。 Ideas for names could be: ident , kw , |||名称的想法可以是: identkw||| , +++ and >>> . , +++>>> I would skip implementing the !!我会跳过实施!! operator initially.运营商最初。

Here are two combinators implemented a little differently,这是两个实现方式略有不同的组合器,

ident :: Parser String
ident = Parser f
  where
    f :: [Token] -> Either SyntaxError (String, [Token])
    f (Id x : toks) = Right (x, toks)
    f (Key x : _) = err $ "Identifier expected, got keyword '" ++ x ++ "'"
    f [] = err "Identifier expected, got EOF"

    (+++) :: Parser a -> Parser b -> Parser (a, b)
    (+++) pa pb = Parser $ \toks1 -> do (x, toks2) <- runP pa toks1
                                        (y, toks3) <- runP pb toks2
                                        return ((x, y), toks3)
    ...

Some final remarks:一些最后的评论:

  • Read the paper Monadic Parsing in Haskell (Hutton, Meijer).阅读Haskell 中的 Monadic Parsing (Hutton, Meijer) 一文。

  • You may be interested in SimpleParse by Ken Friis Larsen, an educational parser combinator library that is a simplification of ReadP by Koen Claessen, since its source code is very easy to read.您可能对 Ken Friis Larsen 的SimpleParse感兴趣,这是一个教育解析器组合器库,它是 Koen Claessen 对ReadP的简化,因为它的源代码非常易于阅读。 They are both non-deterministic.它们都是非确定性的。

  • If you're interested in using parser combinators in Haskell, rather than porting some old-fashioned library for the learning experience, I encourage you too look at Megaparsec (tutorial), a modern fork of Parsec.如果您对在 Haskell 中使用解析器组合器感兴趣,而不是为了学习体验而移植一些老式库,我鼓励您也查看Megaparsec (教程),Parsec 的现代分支。 The implementation is a little complex.实现有点复杂。

  • None of these three libraries (SimpleParse, ReadP, Megaparsec) split lexing and parsing into two separate steps.这三个库(SimpleParse、ReadP、Megaparsec)都没有将词法分析和解析分成两个单独的步骤。 Rather, they simply build small tokenizing parsers that implicitly eat meaningless whitespace.相反,他们只是构建小的标记解析器,隐式地吃无意义的空白。 (See the token combinator in SimpleParse, for example.) However, Megaparsec does allow an arbitrary token type, whether that is Char or some token you have lexed. (例如,参见 SimpleParse 中的token组合器。)然而,Megaparsec 确实允许任意标记类型,无论是Char还是您已经词法分析的某些标记。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM