简体   繁体   English

非常简单的sexp解析器

[英]Very simple sexp parser

For an assignment, we had to implement something like a very basic sexp parser, such that for input like: 对于一个赋值,我们必须实现像一个非常基本的sexp解析器,这样的输入,如:

"((a b) ((c d) e) f)"

It would return: 它将返回:

[["a", "b"], [["c", "d"], "e"], "f"]

Since this was part of a larger assignment, the parser is only given valid input (matching parens &c). 由于这是较大任务的一部分,因此解析器仅获得有效输入(匹配的parens和c)。 I came up with the following solution in Ruby: 我在Ruby中提出了以下解决方案:

def parse s, start, stop
  tokens = s.scan(/#{Regexp.escape(start)}|#{Regexp.escape(stop)}|\w+/)

  stack = [[]]

  tokens.each do |tok|
    case tok
    when start
      stack << []
    when stop
      stack[-2] << stack.pop
    else
      stack[-1] << tok
    end
  end

  return stack[-1][-1]
end

Which may not be the best solution, but it does the job. 这可能不是最好的解决方案,但它可以完成这项工作。

Now, I'm interested in an idiomatic Haskell solution for the core functionality (ie I don't care about the lexing or choice of delimiters, taking already lexed input would be fine), if possible using only "core" haskell, without extensions or libs like parsec. 现在,我对一个惯用的Haskell解决方案的核心功能感兴趣(即我不关心lexing或选择分隔符,考虑已经lexed输入会很好),如果可能只使用“核心”haskell,没有扩展或者像parsec这样的库。

Note that this is NOT part of the assignment, I'm just interested in the Haskell way of doing things. 请注意,这不是赋值的一部分,我只是对Haskell的处理方式感兴趣。

[["a", "b"], [["c", "d"], "e"], "f"]

Does not have a valid type in haskell (because all elements of a list need to be of the same type in haskell), so you'll need to define your own datastructure for nested lists like this: haskell中没有有效类型(因为列表中的所有元素都必须在haskell中具有相同的类型),因此您需要为嵌套列表定义自己的数据结构,如下所示:

data NestedList = Value String | Nesting [NestedList]

Now if you have a list of Tokens where Token is defined as data Token = LPar | RPar | Symbol String 现在,如果你有令牌列表,其中Token被定义为data Token = LPar | RPar | Symbol String data Token = LPar | RPar | Symbol String data Token = LPar | RPar | Symbol String , you can parse that into a NestedList like this: data Token = LPar | RPar | Symbol String ,您可以将其解析为NestedList,如下所示:

parse = fst . parse'

parse' (LPar : tokens) =
    let (inner, rest) = parse' tokens
        (next, outer) = parse' rest
    in
      (Nesting inner : next, outer)
parse' (RPar : tokens) = ([], tokens)
parse' ((Symbol str) : tokens) =
    let (next, outer) = parse' tokens in
    (Value str : next, outer)
parse' [] = ([],[])

The idiomatic way in Haskell would be to use parsec , for combinator parsing. Haskell中惯用的方法是使用parsec进行组合分析。

There are lots of examples online, including, 网上有很多例子,包括

While fancier parsers like Parsec are nice, you don't really need all that power for this simple case. 虽然像Parsec这样的发烧友解析器很不错,但是对于这个简单的情况你并不需要那么强大的功能。 The classic way to parse is using the ReadS type from the Prelude. 解析的经典方法是使用Prelude中的ReadS类型。 That is also the way you would give your Sexp type a Read instance. 这也是你将Sexp类型作为Read实例的方式。

It's good to be at least a little familiar with this style of parsing, because there are quite a few examples of it in the standard libraries. 至少对这种解析方式有点熟悉是很好的,因为标准库中有很多例子。

Here's one simple solution, in the classic style: 这是一个经典风格的简单解决方案:

import Data.Char (isSpace)

data Sexp = Atom String | List [Sexp]
  deriving (Eq, Ord)

instance Show Sexp where
  show (Atom a ) = a
  show (List es) = '(' : unwords (map show es) ++ ")"

instance Read Sexp where
  readsPrec n (c:cs) | isSpace c = readsPrec n cs
  readsPrec n ('(':cs)           = [(List es, cs') |
                                      (es, cs') <- readMany n cs]
  readsPrec _ (')':_)            = error "Sexp: unmatched parens"
  readsPrec _ cs                 = let (a, cs') = span isAtomChar cs
                                   in [(Atom a, cs')]

readMany :: Int -> ReadS [Sexp]
readMany _ (')':cs) = [([], cs)]
readMany n cs       = [(e : es, cs'') | (e, cs') <- readsPrec n cs,
                                        (es, cs'') <- readMany n cs']

isAtomChar :: Char -> Bool
isAtomChar '(' = False
isAtomChar ')' = False
isAtomChar c   = not $ isSpace c

Note that the Int parameter to readsPrec , which usually indicates operator precedence, is not used here. 请注意,此处不使用通常表示运算符优先级的readsPrecInt参数。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM