[英]How to express parsing logic in Parsec ParserT monad
I was working on "Write Yourself a Scheme in 48 hours" to learn Haskell and I've run into a problem I don't really understand. 我当时正在研究“在48小时内编写自己的计划”以学习Haskell,但遇到了一个我不太了解的问题。 It's for question 2 from the exercises at the bottom of this section . 这是本节底部练习中的问题2。
The task is to rewrite 任务是重写
import Text.ParserCombinators.Parsec
parseString :: Parser LispVal
parseString = do
char '"'
x <- many (noneOf "\"")
char '"'
return $ String x
such that quotation marks which are properly escaped (eg in "This sentence \\" is nonsense") get accepted by the parser. 这样解析器会接受正确转义的引号(例如,“此句子\\”中的废话”)。
In an imperative language I might write something like this (roughly pythonic pseudocode): 用命令式语言,我可能会写出这样的内容(大致为pythonic伪代码):
def parseString(input):
if input[0] != "\"" or input[len(input)-1] != "\"":
return error
input = input[1:len(input) - 1] # slice off quotation marks
output = "" # This is the 'zero' that accumulates over the following loop
# If there is a '"' in our string we want to make sure the previous char
# was '\'
for n in range(len(input)):
if input[n] == "\"":
try:
if input[n - 1] != "\\":
return error
catch IndexOutOfBoundsError:
return error
output += input[n]
return output
I've been looking at the docs for Parsec and I just can't figure out how to work this as a monadic expression. 我一直在寻找Parsec的文档 ,但我只是想不出如何将其用作单子表达式。
I got to this: 我明白了这一点:
parseString :: Parser LispVal
parseString = do
char '"'
regular <- try $ many (noneOf "\"\\")
quote <- string "\\\""
char '"'
return $ String $ regular ++ quote
But this only works for one quotation mark and it has to be at the very end of the string--I can't think of a functional expression that does the work that my loops and if-statements do in the imperative pseudocode. 但这仅适用于一个引号,并且必须在字符串的最末端-我想不出一个函数表达式来执行我的循环和if语句在命令式伪代码中所做的工作。
I appreciate you taking your time to read this and give me advice. 感谢您抽出宝贵时间阅读本文并给我建议。
Try something like this: 尝试这样的事情:
dq :: Char
dq = '"'
parseString :: Parser Val
parseString = do
_ <- char dq
x <- many ((char '\\' >> escapes) <|> noneOf [dq])
_ <- char dq
return $ String x
where
escapes = dq <$ char dq
<|> '\n' <$ char 'n'
<|> '\r' <$ char 'r'
<|> '\t' <$ char 't'
<|> '\\' <$ char '\\'
The solution is to define a string literal as a starting quote + many valid characters + an ending quote where a "valid character" is either a an escape sequence or non-quote. 解决方案是将字符串文字定义为开始引号+许多有效字符+结束引号,其中“有效字符”是转义序列或非引号。
So there is a one line change to parseString
: 因此,对parseString
了一行更改:
parseString = do char '"'
x <- many validChar
char '"'
return $ String x
and we add the definitions: 然后添加定义:
validChar = try escapeSequence <|> satisfy ( /= '"' )
escapeSequence = do { char '\\'; anyChar }
escapeSequence
may be refined to allow a limited set of escape sequences. escapeSequence
可以进行细化以允许一组有限的转义序列。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.