简体   繁体   中英

Parsing a sum datatype with parsec

I am trying to figure out how to parse a sum-datatype in Haskell in the best way possible. This is an extract of what I attempted

type Value = Int

data Operator = ADD | SUB | MUL | DIV | SQR deriving (Show)

toOperator :: String -> Maybe Operator
toOperator "ADD" = Just ADD
toOperator "SUB" = Just SUB
toOperator "MUL" = Just MUL
toOperator "DIV" = Just DIV
toOperator "SQR" = Just SQR
toOperator _     = Nothing

parseOperator :: ParsecT String u Identity () Operator
parseOperator = do
    s <- choice $ map (try . string) ["ADD", "SUB", "MUL", "DIV", "SQR"]
    case toOperator s of
        Just x  -> return x
        Nothing -> fail "Could not parse that operator."

This code does what I want but has one obvious problem: It checks the data twice. Once in the line choice $ map (try . string) ["ADD", "SUB", "MUL", "DIV", "SQR"] and once through toOperator .

What I want, is to want to parse a string into an Operator if it occurs in the list, and fail otherwise. But I can't figure out how to do this in a 'clean' way.

It's simpler if you make toOperator participate in the Parsec parsing process directly, rather than having it be a step that happens separately, because then "whether this thing is a valid operator" can provide feedback into the parsing process.

For this specific case where the thing you are parsing is a zero-field enum whose constructor names exactly match the strings you are parsing, there are already several good shortcuts posted, showing you how to concisely parse those constructors. In this answer, I will show an alternative method, which is easier to adapt to the general case of "match one of several cases" and to handle fancier stuff like "one of the three constructors has an Int argument but the others don't."

operator :: StringParser Operator
operator = string "ADD" *> pure ADD
       <|> string "DIV" *> pure DIV 
       <|> string "MUL" *> pure MUL
       <|> try (string "SUB") *> pure SUB 
       <|> string "SQR" *> pure SQR

Now suppose that you had an additional constructor, VAR , taking a String argument. It is easy to add support for that constructor to this parser:

operator :: StringParser Operator
operator = ...
       <|> string "VAR" *> (VAR <$> var)

var :: StringParser String
var = spaces *> anyChar `manyTill` space

You have several options to avoid such duplication.

First, if the names as they appear in the input you try to parse match exactly the constructors of Operator (which seems to be the case in your example), you can avoid toOperator at all by also deriving the Read instance for Operator and just using read . The code would then be along the lines of

parseOperator :: ParsecT String u Identity () Operator
parseOperator = do
    s <- choice $ map (try . string) ["ADD", "SUB", "MUL", "DIV", "SQR"]
    pure $ read s

You'll have to be careful to list the same names here as the Operator constructors and update them as necessary, though.

Second, you can instead build the mapping yourself by defining a list (or a Data.Map , or HashMap ) and then use it both to specify the admissible input and to find the corresponding operator constructor:

operators :: [(String, Operator)]
operators = [("ADD", ADD), ("SUB", SUB), ("MUL", MUL), ("DIV", DIV), ("SQR", SQR)]

parseOperator :: ParsecT String u Identity () Operator
parseOperator = do
    s <- choice $ map (try . string . fst) operators
    case lookup s operators of
        Just x  -> return x
        Nothing -> fail "Could not parse that operator."

Note the case is not really necessary for a well-defined parser: the result of the parse will by definition be in the operators list. And, again, the downside is that you have to keep the operators and the constructors list in sync.

The third, and, perhaps, the sweetest one is to generate the list of operators automatically by some extra type classes: Bounded and Enum , which, combined, allow enumerating all the constructors of a type like yours, and which ghc will happily derive for your Operator . Then operators definition would look like

operators :: [(String, Operator)]
operators = map (\op -> (show op, op)) $ enumFromTo minBound maxBound

You just need an inverse of toOperator to map over the parser; read is a simple (if not robust) example.

>>> data Operator = ADD | SUB | MUL | DIV | SQR deriving (Show, Read)
>>> parse (read <$> string "ADD") "" "ADD" :: Either ParseError Operator
Right ADD

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM