I would like to parse an Intel Hex Record with parsec using the applicative functor style. A typical records looks like the following:
:10010000214601360121470136007EFE09D2190140
The first character is always ':', the next two characters are a hex string representing the number of bytes in the record. The next four characters are a hex string identifying the start address of the data. I had code like the following, but I don't know how to applicatively pass the byte count to the parser that parses the data bytes. My non-working code looks like the following.
line = startOfRecord . byteCount . address . recordType . recordData . checksum
startOfRecord = char ':'
byteCount = toHexValue <$> count 2 hexDigit
address = toHexValue <$> count 4 hexDigit
recordType = toHexValue <$> count 2 hexDigit
recordData c = toHexValue <$> count c hexDigit
recordData c CharParser = count c hexDigit
checksum = toHexValue <$> count 2 hexDigit
toHexValue :: String -> Int
toHexValue = fst . head . readHex
Could anyone help me? Thanks.
There are a number of things not included in your question that you need in order to use parsec. To define things like startOfRecord
, we need to disable the dreaded monomorphism restriction. If we want to write type signatures for anything like startOfRecord
we also need to enable FlexibleContexts
. We also need to import parsec, Control.Applicative
, and Numeric (readHex)
{-# LANGUAGE NoMonomorphismRestriction #-}
{-# LANGUAGE FlexibleContexts #-}
import Text.Parsec
import Control.Applicative
import Numeric (readHex)
I'm also going to use Word8
and Word16
from Data.Word
since they exactly match the types used in intel hex records.
import Data.Word
Ignoring the recordData
for a momement, we can define how to read hex values for bytes ( Word8
) and 16 bit integer addresses ( Word16
).
hexWord8 :: (Stream s m Char) => ParsecT s u m Word8
hexWord8 = toHexValue <$> count 2 hexDigit
hexWord16 :: (Stream s m Char) => ParsecT s u m Word16
hexWord16 = toHexValue <$> count 4 hexDigit
toHexValue :: (Num a, Eq a) => String -> a
toHexValue = fst . head . readHex
This lets us define all of the pieces except for recordData
.
startOfRecord = char ':'
byteCount = hexWord8
address = hexWord16
recordType = hexWord8
checksum = hexWord8
Leaving out recordData
, we can now write something like your line
in Applicative
style. Application in Applicative
style is written as <*>
( .
is function composition or composition in Category
s ).
line = _ <$> startOfRecord <*> byteCount <*> address <*> recordType <*> checksum
The compiler will tell us about the type of the hole _
. It says
Found hole `_'
with type: Char -> Word8 -> Word16 -> Word8 -> Word8 -> b
If we had a function with that type, we could use it here and make a ParserT
that reads something like a record, but still missing the recordData
. We'll make a data type to hold all of an intel hex record except for the actual data.
data IntelHexRecord = IntelHexRecord Word8 Word16 Word8 {- [Word8] -} Word8
If we drop this into line
(with const
to discard the startOfRecord
)
line = const IntelHexRecord <$> startOfRecord <*> byteCount <*> address <*> recordType <*> checksum
the compiler will tell us that the type of line
is a parser for our pseudo- IntelHexRecord
.
*> :t line
line :: Stream s m Char => ParsecT s u m IntelHexRecord
This is as far as we can go with Applicative
style. Let's define how to read the recordData
assuming we already somehow know the byteCount
.
recordData :: (Stream s m Char) => Word8 -> ParsecT s u m [Word8]
recordData c = count (fromIntegral c) hexWord8
We'll also modify IntelHexRecord
to have a place to hold the data.
data IntelHexRecord = IntelHexRecord Word8 Word16 Word8 [Word8] Word8
If you have an Applicative f
, there's no way, in general, to choose the structure based on the contents. That's the big difference between an Applicative
and a Monad
; a Monad
's bind, (>>=) :: forall a b. ma -> (a -> mb) -> mb
(>>=) :: forall a b. ma -> (a -> mb) -> mb
, allows you to choose the structure based on the contents. This is exactly what we need to do to determine how to read the recordData
based on the result we obtained earlier by reading the byteCount
.
The easiest way to use one bind >>=
in the definition of line
is to switch entirely to Monad
ic style and do
-notation.
line = do
startOfRecord
bc <- byteCount
addr <- address
rt <- recordType
rd <- recordData bc
cs <- checksum
return $ IntelHexRecord bc addr rt rd cs
As far as my understanding goes, the limitation of Applicative Parsers (compared with Monadic Parsers) is that you are limited to parsing context-free expressions.
By this I mean that decisions about how to parse at a certain point cannot depend on values parsed before, only on the structure (ie a parser failed, so we try to apply a different one).
I find that this can be explained from the operators themselves:
(<*>) :: Applicative f => f (a -> b) -> f a -> f b
(>>=) :: Monad m => m a -> (a -> m b) -> m b
For <*>
you can see that everything takes place at the level of the values 'contained in' the Applicative whereas for >>=
the value can be used to influence the containing structure. This is precicely what makes Monads more powerful than Applicatives.
For your Problem this means that you nedd to use a monadic parser to stick all the individual pieces together, appoximaly like this:
parseRecord = do
count <- byteCount
...
rData <- recordData count
...
return (count,rData,...)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.