Restriction on the data type definition

Question

I have a type synonym type Entity = ([Feature], Body) for whatever Feature and Body mean. Objects of Entity type are to be grouped together:

type Bunch = [Entity]

and the assumption, crucial for the algorithm working with Bunch , is that any two entities in the same bunch have the equal number of features.

If I were to implement this constraint in an OOP language, I would add the corresponding check to the method encapsulating the addition of entities into a bunch. Is there a better way to do it in Haskell? Preferably, on the definition level. (If the definition of Entity also needs to be changed, no problem.)

Answer 1

Using type-level length annotations

So here's the deal. Haskell does have type-level natural numbers and you can annotate with types using "phantom types". However you do it, the types will look like this:

data Z
data S n
data LAList x len = LAList [x] -- length-annotated list

Then you can add some construction functions for convenience:

lalist1 :: x -> LAList x (S Z)
lalist1 x = LAList [x]
lalist2 :: x -> x -> LAList x (S (S Z))
lalist2 x y = LAList [x, y]
-- ...

And then you've got more generic methods:

(~:) :: x -> LAList x n -> LAList x (S n)
x ~: LAList xs = LAList (x : xs)
infixr 5 ~:

nil :: LAList x Z
nil = LAList []

lahead :: LAList x (S n) -> x
lahead (LAList xs) = head xs

latail :: LAList x (S n) -> LAList x n
latail (LAList xs) = tail xs

but by itself the List definition doesn't have any of this because it's complicated. You may be interested in the Data.FixedList package for a somewhat different approach, too. Basically every approach is going to start off looking a little weird with some data type that has no constructor, but it starts to look normal after a little bit.

You might also be able to get a typeclass so that all of the lalist1 , lalist2 operators above can be replaced with

class FixedLength t where
    la :: t x -> LAList x n

but you will probably need the -XTypeSynonymInstances flag to do this, as you want to do something like

type Pair x = (x, x)
instance FixedLength Pair where
    la :: Pair x -> LAList [x] (S (S Z))
    la (a, b) = LAList [a, b]

(it's a kind mismatch when you go from (a, b) to Pair a ).

Using runtime checking

You can very easily take a different approach and encapsulate all of this as a runtime error or explicitly model the error in your code:

-- this may change if you change your definition of the Bunch type
features :: Entity -> [Feature]
features = fst 

-- we also assume a runBunch :: [Entity] -> Something function 
-- that you're trying to run on this Bunch.

allTheSame :: (Eq x) => [x] -> Bool
allTheSame (x : xs) = all (x ==) xs
allTheSame [] = True

permissiveBunch :: [Entity] -> Maybe Something
permissiveBunch es
  | allTheSame (map (length . features) es) = Just (runBunch es)
  | otherwise = Nothing

strictBunch :: [Entity] -> Something
strictBunch es 
  | allTheSame (map (length . features) es) = runBunch es
  | otherwise = error ("runBunch requires all feature lists to be the same length; saw instead " ++ show (map (length . features) es))

Then your runBunch can just assume that all the lengths are the same and it's explicitly checked for above. You can get around pattern-matching weirdnesses with, say, the zip :: [a] -> [b] -> [(a, b)] function in the Prelude, if you need to pair up the features next to each other. (The goal here would be an error in an algorithm due to pattern-matching for both runBunch' (x:xs) (y:ys) and runBunch' [] [] but then Haskell warns that there are 2 patterns which you've not considered in the match.)

Using tuples and type classes

One final way to do it which is a compromise between the two (but makes for pretty good Haskell code) involves making Entity parametrized over all features:

type Entity x = (x, Body)

and then including a function which can zip different entities of different lengths together:

class ZippableFeatures z where
    fzip :: z -> z -> [(Feature, Feature)]

instance ZippableFeatures () where
    fzip () () = []

instance ZippableFeatures Feature where
    fzip f1 f2 = [(f1, f2)]

instance ZippableFeatures (Feature, Feature) where
    fzip (a1, a2) (b1, b2) = [(a1, b1), (a2, b2)]

Then you can use tuples for your feature lists, as long as they don't get any larger than the maximum tuple length (which is 15 on my GHC). If you go larger than that, of course, you can always define your own data types, but it's not going to be as general as type-annotated lists.

If you do this, your type signature for runBunch will simply look like:

 runBunch :: (ZippableFeatures z) => [Entity z] -> Something

When you run it on things with the wrong number of features you'll get compiler errors that it can't unify the type (a, b) with (a, b, c).

Answer 2

There are various ways to enforce length constraints like that; here's one:

{-# LANGUAGE DataKinds, KindSignatures, GADTs, TypeFamilies #-}
import Prelude hiding (foldr)
import Data.Foldable
import Data.Monoid
import Data.Traversable
import Control.Applicative

data Feature  -- Whatever that really is

data Body  -- Whatever that really is

data Nat = Z | S Nat  -- Natural numbers

type family Plus (m::Nat) (n::Nat) where  -- Type level natural number addition
  Plus Z n = n
  Plus (S m) n = S (Plus m n)

data LList (n :: Nat) a where  -- Lists tagged with their length at the type level
  Nil :: LList Z a
  Cons :: a -> LList n a -> LList (S n) a

Some functions on these lists:

llHead :: LList (S n) a -> a
llHead (Cons x _) = x

llTail :: LList (S n) a -> LList n a
llTail (Cons _ xs) = xs

llAppend :: LList m a -> LList n a -> LList (Plus m n) a
llAppend Nil ys = ys
llAppend (Cons x xs) ys = Cons x (llAppend xs ys)

data Entity n = Entity (LList n Feature) Body

data Bunch where
   Bunch :: [Entity n] -> Bunch

Some instances:

instance Functor (LList n) where
   fmap f Nil = Nil
   fmap f (Cons x xs) = Cons (f x) (fmap f xs)

instance Foldable (LList n) where
   foldMap f Nil = mempty
   foldMap f (Cons x xs) = f x `mappend` foldMap f xs

instance Traversable (LList n) where
   traverse f Nil = pure Nil
   traverse f (Cons x xs) = Cons <$> f x <*> traverse f xs

And so on. Note that n in the definition of Bunch is existential . It can be anything, and what it actually is doesn't affect the type—all bunches have the same type. This limits what you can do with bunches to a certain extent. Alternatively, you can tag the bunch with the length of its feature lists. It all depends what you need to do with this stuff in the end.

Restriction on the data type definition

Question

2 answers

solution1
2 ACCPTED 2014-10-16 18:39:58

Using type-level length annotations

Using runtime checking

Using tuples and type classes

solution2
2 2014-10-16 20:19:28

Restriction on the data type definition

Question

2 answers

solution1 2 ACCPTED 2014-10-16 18:39:58

Using type-level length annotations

Using runtime checking

Using tuples and type classes

solution2 2 2014-10-16 20:19:28

solution1
2 ACCPTED 2014-10-16 18:39:58

solution2
2 2014-10-16 20:19:28