[英]Parsing a particular string in Haskell
I'm using the parsec Haskell library. 我正在使用parsec Haskell库。
I want to parse strings of the following kind: 我想解析以下类型的字符串:
[[v1]][[v2]]
xyz[[v1]][[v2]]
[[v1]]xyz[[v2]]
etc. 等等
I'm interesting to collect only the values v1 and v2, and store these in a data structure. 我很有意思只收集值v1和v2,并将它们存储在数据结构中。
I tried with the following code: 我尝试使用以下代码:
import Text.ParserCombinators.Parsec
quantifiedVars = sepEndBy var (string "]]")
var = between (string "[[") (string "") (many (noneOf "]]"))
parseSL :: String -> Either ParseError [String]
parseSL input = parse quantifiedVars "(unknown)" input
main = do {
c <- getContents;
case parse quantifiedVars "(stdin)" c of {
Left e -> do { putStrLn "Error parsing input:"; print e; };
Right r -> do{ putStrLn "ok"; mapM_ print r; };
}
}
In this way, if the input is "[[v1]][[v2]]"
the program works fine, returning the following output: 这样,如果输入是
"[[v1]][[v2]]"
,程序运行正常,返回以下输出:
"v1"
"v2"
If the input is "xyz[[v1]][[v2]]"
the program doesn't work. 如果输入为
"xyz[[v1]][[v2]]"
则程序不起作用。 In particular, I want only what is contained in [[...]]
, ignoring "xyz"
. 特别是,我只想要
[[...]]
,忽略"xyz"
。
Also, I want to store the content of [[...]]
in a data structure. 另外,我想将
[[...]]
的内容存储在数据结构中。
How do you solve this problem? 你怎么解决这个问题?
You need to restructure your parser. 您需要重构解析器。 You are using combinators in very strange locations, and they mess things up.
你在非常奇怪的地方使用组合器,它们搞砸了。
A var
is a varName
between "[[" and "]]". var
是“[[”和“]]”之间的varName
。 So, write that: 所以写下:
var = between (string "[[") (string "]]") varName
A varName
should have some kind of format (I don't think that you want to accept "%A¤%&", do you?), so you should make a parser for that; varName
应该有某种格式(我不认为你想接受“%A¤%&”,对吗?),所以你应该为它做一个解析器; but in case it really can be anything, just do this: 但如果它真的可以是任何东西,只需这样做:
varName = many $ noneOf "]"
Then, a text containing vars, is something with vars separated by non-vars. 然后,包含变量的文本是由非变量分隔的变量。
varText = someText *> var `sepEndBy` someText
... where someText
is anything except a '[': ...
someText
除了'['之外的任何东西:
someText = many $ noneOf "["
Things get more complicated if you want this to be parseable: 如果你想要解析它,事情变得更复杂:
bla bla [ bla bla [[somevar]blabla]]
Then you need a better parser for varName
and someText
: 那么你需要一个更好的
varName
和someText
解析器:
varName = concat <$> many (try incompleteTerminator <|> many1 (noneOf "]"))
-- Parses e.g. "]a"
incompleteTerminator = (\ a b -> [a, b]) <$> char ']' <*> noneOf "]"
someText = concat <$> many (try incompleteInitiator <|> many1 (noneOf "["))
-- Parses e.g. "[b"
incompleteInitiator = (\ a b -> [a, b]) <$> char '[' <*> noneOf "["
PS . PS 。
(<*>)
, (*>)
and (<$>)
is from Control.Applicative
. (<*>)
, (*>)
和(<$>)
来自Control.Applicative
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.