I am trying to figure out how to parse a sum-datatype in Haskell in the best way possible. This is an extract of what I attempted
type Value = Int
data Operator = ADD | SUB | MUL | DIV | SQR deriving (Show)
toOperator :: String -> Maybe Operator
toOperator "ADD" = Just ADD
toOperator "SUB" = Just SUB
toOperator "MUL" = Just MUL
toOperator "DIV" = Just DIV
toOperator "SQR" = Just SQR
toOperator _ = Nothing
parseOperator :: ParsecT String u Identity () Operator
parseOperator = do
s <- choice $ map (try . string) ["ADD", "SUB", "MUL", "DIV", "SQR"]
case toOperator s of
Just x -> return x
Nothing -> fail "Could not parse that operator."
This code does what I want but has one obvious problem: It checks the data twice. Once in the line choice $ map (try . string) ["ADD", "SUB", "MUL", "DIV", "SQR"]
and once through toOperator
.
What I want, is to want to parse a string into an Operator
if it occurs in the list, and fail otherwise. But I can't figure out how to do this in a 'clean' way.
It's simpler if you make toOperator
participate in the Parsec parsing process directly, rather than having it be a step that happens separately, because then "whether this thing is a valid operator" can provide feedback into the parsing process.
For this specific case where the thing you are parsing is a zero-field enum whose constructor names exactly match the strings you are parsing, there are already several good shortcuts posted, showing you how to concisely parse those constructors. In this answer, I will show an alternative method, which is easier to adapt to the general case of "match one of several cases" and to handle fancier stuff like "one of the three constructors has an Int argument but the others don't."
operator :: StringParser Operator
operator = string "ADD" *> pure ADD
<|> string "DIV" *> pure DIV
<|> string "MUL" *> pure MUL
<|> try (string "SUB") *> pure SUB
<|> string "SQR" *> pure SQR
Now suppose that you had an additional constructor, VAR
, taking a String argument. It is easy to add support for that constructor to this parser:
operator :: StringParser Operator
operator = ...
<|> string "VAR" *> (VAR <$> var)
var :: StringParser String
var = spaces *> anyChar `manyTill` space
You have several options to avoid such duplication.
First, if the names as they appear in the input you try to parse match exactly the constructors of Operator
(which seems to be the case in your example), you can avoid toOperator
at all by also deriving the Read
instance for Operator
and just using read
. The code would then be along the lines of
parseOperator :: ParsecT String u Identity () Operator
parseOperator = do
s <- choice $ map (try . string) ["ADD", "SUB", "MUL", "DIV", "SQR"]
pure $ read s
You'll have to be careful to list the same names here as the Operator
constructors and update them as necessary, though.
Second, you can instead build the mapping yourself by defining a list (or a Data.Map
, or HashMap
) and then use it both to specify the admissible input and to find the corresponding operator constructor:
operators :: [(String, Operator)]
operators = [("ADD", ADD), ("SUB", SUB), ("MUL", MUL), ("DIV", DIV), ("SQR", SQR)]
parseOperator :: ParsecT String u Identity () Operator
parseOperator = do
s <- choice $ map (try . string . fst) operators
case lookup s operators of
Just x -> return x
Nothing -> fail "Could not parse that operator."
Note the case
is not really necessary for a well-defined parser: the result of the parse will by definition be in the operators
list. And, again, the downside is that you have to keep the operators
and the constructors list in sync.
The third, and, perhaps, the sweetest one is to generate the list of operators automatically by some extra type classes: Bounded
and Enum
, which, combined, allow enumerating all the constructors of a type like yours, and which ghc will happily derive for your Operator
. Then operators
definition would look like
operators :: [(String, Operator)]
operators = map (\op -> (show op, op)) $ enumFromTo minBound maxBound
You just need an inverse of toOperator
to map over the parser; read
is a simple (if not robust) example.
>>> data Operator = ADD | SUB | MUL | DIV | SQR deriving (Show, Read)
>>> parse (read <$> string "ADD") "" "ADD" :: Either ParseError Operator
Right ADD
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.