简体   繁体   English

使用Parsec设计解析代码

[英]Designing parsing code using Parsec

In the course for following the tutorial Write yourself a Scheme in 48 hours , I was attempting to enhance my parsing code to create support for hexadecimal, octal, binary and decimal literals. 在遵循教程48小时内编写自己的方案的过程中 ,我试图增强解析代码,以创建对十六进制,八进制,二进制和十进制文字的支持。

import Text.ParserCombinators.Parsec hiding (spaces)
import Control.Monad
import Numeric

hexChar :: Parser Char
hexChar = ...

octChar :: Parser Char
octChar = ...

hexNumber :: Parser Integer
hexNumber = do
  char '#'
  char 'x'
  s <- many1 hexChar
  return $ (fst . head) readHex s


 octNumber :: Parser Integer
 octNumber = do
  char '#'
  char 'o'
  s <- many1 hexChar
  return $ (fst . head) readOct s

If we forget about decimal and binary numbers in this discussion: 如果我们在本次讨论中忘记了十进制和二进制数:

parseNumber :: Parse Integer
parseNumber = hexNumber <|> octNumber

Then this parser will fail to recognize octal numbers. 然后,此解析器将无法识别八进制数。 This seems to be related to the number of lookahead characters required to tell apart and octal from hexadecimal numbers (if we drop the leading '#' in the syntax, then the parser will work). 这似乎与区分十六进制数字和八进制数字所需的超前字符数量有关(如果我们删除语法中的前导“#”,则解析器将起作用)。 Hence it seems we are forced to revisit the code and 'factorize' the leading '#' so to speak, by dropping the char '#' in the individual parsers and defining: 因此,似乎我们被迫重新审视代码,并通过在单个解析器中删除char '#'并定义以下内容来“分解”前置的“#”:

parseNumber = char '#' >> (hexNumber <|> octNumber)

This is fine but I find the code less pleasant. 很好,但是我发现代码不太令人满意。 Somehow, if I have a function called hexNumber I would expect it to recognize #xffff (which is proper Scheme syntax) and not xffff . 不知何故,如果我有一个名为hexNumber的函数,我希望它能够识别#xffff (这是正确的Scheme语法)而不是xffff Is this something I have to live with, or are there ways to go around this 'forced factorization' of the leading character #? 这是我必须忍受的吗,还是有办法解决主角#的这种“强制分解”?

If the first argument of (<|>) fails after having consumed some input, then it fails immediately without trying the second alternative. 如果(<|>)的第一个参数在消耗了一些输入后失败,那么它将立即失败而无需尝试第二个选择。 If a failure in the first argument should lead to a retry with the second argument, you can use try to avoid consuming input. 如果第一个参数失败会导致第二个参数重试,则可以使用try避免消耗输入。 In hexNumber you must consume '#' only if the following character matches 'x' . hexNumber ,仅当以下字符与'x'匹配时,才必须使用'#' 'x'

hexNumber :: Parser Integer
hexNumber = do
  try $ char '#' >> char 'x'
  s <- many1 hexChar
  return $ (fst . head) readHex s


octNumber :: Parser Integer
octNumber = do
  try $ char '#' >> char 'o'
  s <- many1 hexChar
  return $ (fst . head) readOct s

Note that this is somewhat inefficient since you parse '#' twice, and it gets worse as the common prefix gets longer and more complex. 请注意,由于您两次解析了'#' ,因此效率有些低下,并且随着公共前缀变得更长且更复杂,情况变得更糟。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM