解析Haskell中的特定字符串

Question

I'm using the parsec Haskell library. 我正在使用parsec Haskell库。

I want to parse strings of the following kind: 我想解析以下类型的字符串：

[[v1]][[v2]]

xyz[[v1]][[v2]]

[[v1]]xyz[[v2]]

etc. 等等

I'm interesting to collect only the values v1 and v2, and store these in a data structure. 我很有意思只收集值v1和v2，并将它们存储在数据结构中。

I tried with the following code: 我尝试使用以下代码：

import Text.ParserCombinators.Parsec

quantifiedVars = sepEndBy var (string "]]")
var = between (string "[[") (string "") (many (noneOf "]]"))

parseSL :: String -> Either ParseError [String]
parseSL input = parse quantifiedVars "(unknown)" input

main = do {
   c <- getContents;
   case parse quantifiedVars "(stdin)" c of {
      Left e -> do { putStrLn "Error parsing input:"; print e; };
      Right r -> do{ putStrLn "ok"; mapM_ print r; };
   }
}

In this way, if the input is "[[v1]][[v2]]" the program works fine, returning the following output: 这样，如果输入是"[[v1]][[v2]]" ，程序运行正常，返回以下输出：

"v1"

"v2"

If the input is "xyz[[v1]][[v2]]" the program doesn't work. 如果输入为"xyz[[v1]][[v2]]"则程序不起作用。 In particular, I want only what is contained in [[...]] , ignoring "xyz" . 特别是，我只想要[[...]] ，忽略"xyz" 。

Also, I want to store the content of [[...]] in a data structure. 另外，我想将[[...]]的内容存储在数据结构中。

How do you solve this problem? 你怎么解决这个问题？

Answer 1

You need to restructure your parser. 您需要重构解析器。 You are using combinators in very strange locations, and they mess things up. 你在非常奇怪的地方使用组合器，它们搞砸了。

A var is a varName between "[[" and "]]". var是“[[”和“]]”之间的varName 。 So, write that: 所以写下：

var = between (string "[[") (string "]]") varName

A varName should have some kind of format (I don't think that you want to accept "%A¤%&", do you?), so you should make a parser for that; varName应该有某种格式（我不认为你想接受“％A¤％＆”，对吗？），所以你应该为它做一个解析器; but in case it really can be anything, just do this: 但如果它真的可以是任何东西，只需这样做：

varName = many $ noneOf "]"

Then, a text containing vars, is something with vars separated by non-vars. 然后，包含变量的文本是由非变量分隔的变量。

varText = someText *> var `sepEndBy` someText

... where someText is anything except a '[': ... someText除了'['之外的任何东西：

someText = many $ noneOf "["

Things get more complicated if you want this to be parseable: 如果你想要解析它，事情变得更复杂：

bla bla [ bla bla [[somevar]blabla]]

Then you need a better parser for varName and someText : 那么你需要一个更好的varName和someText解析器：

varName = concat <$> many (try incompleteTerminator <|> many1 (noneOf "]"))

-- Parses e.g. "]a"
incompleteTerminator = (\ a b -> [a, b]) <$> char ']' <*> noneOf "]"

someText = concat <$> many (try incompleteInitiator <|> many1 (noneOf "["))

-- Parses e.g. "[b"
incompleteInitiator = (\ a b -> [a, b]) <$> char '[' <*> noneOf "["

PS . PS 。 (<*>) , (*>) and (<$>) is from Control.Applicative . (<*>) ， (*>)和(<$>)来自Control.Applicative 。

解析Haskell中的特定字符串

问题描述

1 个解决方案

解决方案1
10 已采纳 2012-02-14 15:07:57

解析Haskell中的特定字符串

问题描述

1 个解决方案

解决方案1 10 已采纳 2012-02-14 15:07:57

解决方案1
10 已采纳 2012-02-14 15:07:57