简体   繁体   中英

Designing parsing code using Parsec

In the course for following the tutorial Write yourself a Scheme in 48 hours , I was attempting to enhance my parsing code to create support for hexadecimal, octal, binary and decimal literals.

import Text.ParserCombinators.Parsec hiding (spaces)
import Control.Monad
import Numeric

hexChar :: Parser Char
hexChar = ...

octChar :: Parser Char
octChar = ...

hexNumber :: Parser Integer
hexNumber = do
  char '#'
  char 'x'
  s <- many1 hexChar
  return $ (fst . head) readHex s


 octNumber :: Parser Integer
 octNumber = do
  char '#'
  char 'o'
  s <- many1 hexChar
  return $ (fst . head) readOct s

If we forget about decimal and binary numbers in this discussion:

parseNumber :: Parse Integer
parseNumber = hexNumber <|> octNumber

Then this parser will fail to recognize octal numbers. This seems to be related to the number of lookahead characters required to tell apart and octal from hexadecimal numbers (if we drop the leading '#' in the syntax, then the parser will work). Hence it seems we are forced to revisit the code and 'factorize' the leading '#' so to speak, by dropping the char '#' in the individual parsers and defining:

parseNumber = char '#' >> (hexNumber <|> octNumber)

This is fine but I find the code less pleasant. Somehow, if I have a function called hexNumber I would expect it to recognize #xffff (which is proper Scheme syntax) and not xffff . Is this something I have to live with, or are there ways to go around this 'forced factorization' of the leading character #?

If the first argument of (<|>) fails after having consumed some input, then it fails immediately without trying the second alternative. If a failure in the first argument should lead to a retry with the second argument, you can use try to avoid consuming input. In hexNumber you must consume '#' only if the following character matches 'x' .

hexNumber :: Parser Integer
hexNumber = do
  try $ char '#' >> char 'x'
  s <- many1 hexChar
  return $ (fst . head) readHex s


octNumber :: Parser Integer
octNumber = do
  try $ char '#' >> char 'o'
  s <- many1 hexChar
  return $ (fst . head) readOct s

Note that this is somewhat inefficient since you parse '#' twice, and it gets worse as the common prefix gets longer and more complex.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM