简体   繁体   中英

How to express parsing logic in Parsec ParserT monad

I was working on "Write Yourself a Scheme in 48 hours" to learn Haskell and I've run into a problem I don't really understand. It's for question 2 from the exercises at the bottom of this section .

The task is to rewrite

import Text.ParserCombinators.Parsec
parseString :: Parser LispVal
parseString = do
                char '"'
                x <- many (noneOf "\"")
                char '"'
                return $ String x

such that quotation marks which are properly escaped (eg in "This sentence \\" is nonsense") get accepted by the parser.

In an imperative language I might write something like this (roughly pythonic pseudocode):

def parseString(input): 
  if input[0] != "\"" or input[len(input)-1] != "\"":
    return error
  input = input[1:len(input) - 1] # slice off quotation marks  
  output = "" # This is the 'zero' that accumulates over the following loop
  # If there is a '"' in our string we want to make sure the previous char
  # was '\'  

  for n in range(len(input)):
    if input[n] == "\"":
      try:
        if input[n - 1] != "\\":
          return error
      catch IndexOutOfBoundsError:
        return error
    output += input[n]
  return output

I've been looking at the docs for Parsec and I just can't figure out how to work this as a monadic expression.

I got to this:

parseString :: Parser LispVal
parseString = do
                char '"'
                regular <- try $ many (noneOf "\"\\")
                quote <- string "\\\""
                char '"'
                return $ String $ regular ++ quote

But this only works for one quotation mark and it has to be at the very end of the string--I can't think of a functional expression that does the work that my loops and if-statements do in the imperative pseudocode.

I appreciate you taking your time to read this and give me advice.

Try something like this:

dq :: Char
dq = '"'

parseString :: Parser Val
parseString = do
  _ <- char dq
  x <- many ((char '\\' >> escapes) <|> noneOf [dq])
  _ <- char dq
  return $ String x
    where
      escapes = dq <$ char dq
            <|> '\n' <$ char 'n'
            <|> '\r' <$ char 'r'
            <|> '\t' <$ char 't'
            <|> '\\' <$ char '\\'

The solution is to define a string literal as a starting quote + many valid characters + an ending quote where a "valid character" is either a an escape sequence or non-quote.

So there is a one line change to parseString :

parseString = do char '"'
                 x <- many validChar
                 char '"'
                 return $ String x

and we add the definitions:

validChar = try escapeSequence <|> satisfy ( /= '"' )
escapeSequence = do { char '\\'; anyChar }

escapeSequence may be refined to allow a limited set of escape sequences.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM