简体   繁体   中英

Reading long data structure in Haskell

I have to read a data structure from a text file (space separated), one data item per line. My first tentative would be

data Person = Person {name :: String, surname :: String, age :: Int, ... dozens of other fields} deriving (Show,...)

main = do
  string <- readFile "filename.txt"
  let people = readPeople string
  do_something people

readPeople s = map (readPerson.words) (lines s)

readPerson row = Person (read(row!!0)) (read(row!!1)) (read(row!!2)) (read(row!!3)) ... (read(row!!dozens))

This code works, but the code for readPerson is terrible: I have to copy-paste the read(row!!n)) for all fields in my data structure!

So, as a second attempt, I think that I might exploit Currying of the Person function, and pass it the arguments one at the time.

Uhm, there must be something in Hoogle, but I cannot figure out the type signature ... Never mind, it looks simple enough and I can write it myself:

readPerson row = readFields Person row

readFields f [x] = (f x)
readFields f (x:xs) = readFields (f (read x)) xs

Ahh, looks much better coding style!

But, it does not compile! Occurs check: cannot construct the infinite type: t ~ String -> t

Indeed, the function f I am passing to readFields has a different type signature in each invocation; that's why I could not figure its type signature ...

So, my question is: what is the simplest and elegant way to read a data structure with many fields?

First, it's always a good practice to include types for all top-level declaration. It makes code better structured and much more readable.

One simple way how to achieve this is to take advantage of applicative functors . During parsing, you have an "effectful" computation where the effect is consuming part of the input and its result is one parsed piece. We can use the State monad to track the remaining input, and create a polymorphic function that consumes one element of the input and read s it:

import Control.Applicative
import Control.Monad.State

data Person = Person { name :: String, surname :: String, age :: Int }
    deriving (Eq, Ord, Show, Read)

readField :: (Read a) => State [String] a
readField = state $ \(x : xs) -> (read x, xs)

And in order to parse many such fields we use the <$> and <*> combinators which allow to sequence operations as follows:

readPerson :: [String] -> Person
readPerson = evalState $ Person <$> readField <*> readField <*> readField

Expression Person <$> ... is of type State [String] Person and we run evalState on given input to run the stateful computation and extract the output. We still need to have the same number of readField as many times as there are fields, but without having to use indices or explicit types.

For a real program you'd probably include some error handling, as read fails with an exception, as well as the patterm (x : xs) if the input list is too short. Using a full-fledged parser such as parsec or attoparsec allows you to use the same notation and to have proper error handling, customize parsing of individual fields etc.


Even more universal way is to automate wrapping and unwrapping fields into lists using generics . Then you just derive Generic . If you're interested, I can give an example.

Or, you could use an existing serialization package, either a binary one like cereal or binary , or a text-based one such as aeson or yaml , which usually allow you to do both (either automatically derive (de)serialization from Generic or provide your custom one).

EDIT: Simpler solution if you are reading from strings:

{-# LANGUAGE FlexibleInstances #-}

data Person = Person { name :: String, age :: Int, height :: Double }
    deriving Show

class Person' a where
    person :: a -> [String] -> Maybe Person

instance Person' Person where
    person c [] = Just c
    person _ _  = Nothing

instance (Read a, Person' b) => Person' (a -> b) where
    person f (x:xs) = person (f $ read x) xs
    person _ _      = Nothing

instance {-# OVERLAPPING #-} Person' a => Person' (String -> a) where
    person f (x:xs) = person (f x) xs
    person _ _      = Nothing

then, if the list is of the right size you get:

\> person Person $ words "John 42 6.05"
Just (Person {name = "John", age = 42, height = 6.05})

and if not you get nothing:

\> person Person $ words "John 42"
Nothing

Constructing Haskell data types with many fields provides a solution when all the record fields are of the same type. If they are not, a slightly more polymorphic solution would be:

{-# LANGUAGE FlexibleInstances, CPP #-}

data Person = Person { name :: String, age :: Int, height :: Double }
    deriving Show

data Val = IVal Int | DVal Double | SVal String

class Person' a where
    person :: a -> [Val] -> Maybe Person

instance Person' Person where
    person c [] = Just c
    person _ _  = Nothing

#define PERSON(t, n)                                \
instance (Person' a) => Person' (t -> a) where {    \
    person f ((n i):xs) = person (f i) xs;          \
    person _ _ = Nothing; }                         \

PERSON(Int,    IVal)
PERSON(Double, DVal)
PERSON(String, SVal)

then,

\> person Person [SVal "John", IVal 42, DVal 6.05]
Just (Person {name = "John", age = 42, height = 6.05})

In order to construct Val types, you may create another type-class and make the desired instances:

class Cast a where
    cast :: a -> Val

instance Cast Int    where cast = IVal
instance Cast Double where cast = DVal
instance Cast String where cast = SVal

then, it would be slightly simpler notation:

\> person Person [cast "John", cast (42 :: Int), cast 6.05]
Just (Person {name = "John", age = 42, height = 6.05})

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM