How do you parse an Intel Hex Record with applicative functors using the haskell parsec library?

Question

I would like to parse an Intel Hex Record with parsec using the applicative functor style. A typical records looks like the following:

:10010000214601360121470136007EFE09D2190140

The first character is always ':', the next two characters are a hex string representing the number of bytes in the record. The next four characters are a hex string identifying the start address of the data. I had code like the following, but I don't know how to applicatively pass the byte count to the parser that parses the data bytes. My non-working code looks like the following.

line = startOfRecord . byteCount . address . recordType . recordData . checksum
startOfRecord = char ':'
byteCount = toHexValue <$> count 2 hexDigit
address = toHexValue <$> count 4 hexDigit
recordType = toHexValue <$> count 2 hexDigit
recordData c = toHexValue <$> count c hexDigit
recordData c CharParser = count c hexDigit
checksum = toHexValue <$> count 2 hexDigit

toHexValue :: String -> Int
toHexValue = fst . head . readHex

Could anyone help me? Thanks.

Answer 1

There are a number of things not included in your question that you need in order to use parsec. To define things like startOfRecord , we need to disable the dreaded monomorphism restriction. If we want to write type signatures for anything like startOfRecord we also need to enable FlexibleContexts . We also need to import parsec, Control.Applicative , and Numeric (readHex)

{-# LANGUAGE NoMonomorphismRestriction #-}
{-# LANGUAGE FlexibleContexts #-}

import Text.Parsec
import Control.Applicative
import Numeric (readHex)

I'm also going to use Word8 and Word16 from Data.Word since they exactly match the types used in intel hex records.

import Data.Word

Ignoring the recordData for a momement, we can define how to read hex values for bytes ( Word8 ) and 16 bit integer addresses ( Word16 ).

hexWord8 :: (Stream s m Char) => ParsecT s u m Word8
hexWord8 = toHexValue <$> count 2 hexDigit

hexWord16 :: (Stream s m Char) => ParsecT s u m Word16
hexWord16 = toHexValue <$> count 4 hexDigit

toHexValue :: (Num a, Eq a) => String -> a
toHexValue = fst . head . readHex

This lets us define all of the pieces except for recordData .

startOfRecord = char ':'
byteCount = hexWord8
address = hexWord16
recordType = hexWord8
checksum = hexWord8

Leaving out recordData , we can now write something like your line in Applicative style. Application in Applicative style is written as <*> ( . is function composition or composition in Category s ).

line = _ <$> startOfRecord <*> byteCount <*> address <*> recordType <*> checksum

The compiler will tell us about the type of the hole _ . It says

    Found hole `_'
      with type: Char -> Word8 -> Word16 -> Word8 -> Word8 -> b

If we had a function with that type, we could use it here and make a ParserT that reads something like a record, but still missing the recordData . We'll make a data type to hold all of an intel hex record except for the actual data.

data IntelHexRecord = IntelHexRecord Word8 Word16 Word8 {- [Word8] -} Word8

If we drop this into line (with const to discard the startOfRecord )

line = const IntelHexRecord <$> startOfRecord <*> byteCount <*> address <*> recordType <*> checksum

the compiler will tell us that the type of line is a parser for our pseudo- IntelHexRecord .

*> :t line
line :: Stream s m Char => ParsecT s u m IntelHexRecord

This is as far as we can go with Applicative style. Let's define how to read the recordData assuming we already somehow know the byteCount .

recordData :: (Stream s m Char) => Word8 -> ParsecT s u m [Word8]
recordData c = count (fromIntegral c) hexWord8

We'll also modify IntelHexRecord to have a place to hold the data.

data IntelHexRecord = IntelHexRecord Word8 Word16 Word8 [Word8] Word8

If you have an Applicative f , there's no way, in general, to choose the structure based on the contents. That's the big difference between an Applicative and a Monad ; a Monad 's bind, (>>=) :: forall a b. ma -> (a -> mb) -> mb (>>=) :: forall a b. ma -> (a -> mb) -> mb , allows you to choose the structure based on the contents. This is exactly what we need to do to determine how to read the recordData based on the result we obtained earlier by reading the byteCount .

The easiest way to use one bind >>= in the definition of line is to switch entirely to Monad ic style and do -notation.

line = do
    startOfRecord
    bc   <- byteCount
    addr <- address
    rt   <- recordType
    rd   <- recordData bc
    cs   <- checksum
    return $ IntelHexRecord bc addr rt rd cs

Answer 2

As far as my understanding goes, the limitation of Applicative Parsers (compared with Monadic Parsers) is that you are limited to parsing context-free expressions.

By this I mean that decisions about how to parse at a certain point cannot depend on values parsed before, only on the structure (ie a parser failed, so we try to apply a different one).

I find that this can be explained from the operators themselves:

(<*>) :: Applicative f => f (a -> b) -> f a -> f b
(>>=) :: Monad m => m a -> (a -> m b) -> m b

For <*> you can see that everything takes place at the level of the values 'contained in' the Applicative whereas for >>= the value can be used to influence the containing structure. This is precicely what makes Monads more powerful than Applicatives.

For your Problem this means that you nedd to use a monadic parser to stick all the individual pieces together, appoximaly like this:

parseRecord = do
  count <- byteCount
  ...
  rData <- recordData count
  ...
  return (count,rData,...)

How do you parse an Intel Hex Record with applicative functors using the haskell parsec library?

Question

2 answers

solution1
3 ACCPTED 2015-02-14 00:45:39

solution2
1 2015-02-14 02:28:08

How do you parse an Intel Hex Record with applicative functors using the haskell parsec library?

Question

2 answers

solution1 3 ACCPTED 2015-02-14 00:45:39

solution2 1 2015-02-14 02:28:08

solution1
3 ACCPTED 2015-02-14 00:45:39

solution2
1 2015-02-14 02:28:08