简体   繁体   中英

Indexing a Haskell data structure for queries

I've a Data.Vector of Dog records which each identify a House where said dog lives. I'll need a lookup routine for finding all the dogs that live in a house vaguely like the following, but I need constant time lookups, which this first version cannot provide.

dogs_by_houses dogs h = [ d | d <- Vec.toList dogs, h == house d ]

As I understand it, a central rule for optimizing Haskell code is that the compiler only computes each expression once inside it's enclosing lambda expression. I must therefore build a lookup table for this particular dogs inside the dogs_by_houses dogs expression before binding the h , yes?

I presume that Data.Vector is the best tool for this task, although apparently you cannot shrink them like you could C++ vectors. I'd implement this roughly as follows :

dogs_by_houses :: Vec.Vector Dog -> Int -> [Dog]
dogs_by_houses dogs = let {
        dog_house = house_id . house ;
        v0 = Vec.replicate (maximum . map dog_house $ Vec.toList dogs) [] ;
        f v d = let { h = dog_house d } in v // [(h,d:v!h)] ;
        dbh = Vec.foldl' f v0 dogs
   } in (dbh !)

Is there anything grossly silly here optimization wise? I presume that strictness tags on variables like dbh won't help much since by definition dogs must be traversed before dbh makes sense.

Is there any big advantage to doing this with an MVector and create instead folds returning modified immutable vectors? All my attempts at using MVector and create have thus far come out must less concise, various layers of do s or fold (>>) like constructs or whatever. I presume the compiler should simply build dbh in place even without being explicitly given an MVector .

Is this algorithm impossible to achieve with lists? You occasionally see people building lazy infinite lists of primes and then selecting the nth prime number with primes !! n primes !! n . I'd assume that retrieving the nth prime that way require traversing the first n primes in the list every time you do so. Conversely, I've noticed that GHC stores strings as C strings, not lists. Would the compiler simply represent known list elements as an array rather than re-traversing the list for each one?

Update :

I've employed the answers by Paul Johnson and Louis Wasserman to build a function to index an arbitrary vector this way because I must do so based upon several different indexing functions.

vector_indexer idx vec = \i -> (Vec.!) t i
  where m = maximum $ map idx $ Vec.toList vec
        t = Vec.accumulate (flip (:)) (Vec.replicate m []) 
               $ Vec.map (\v -> (idx v, v)) vec
dogs_by_houses = vector_indexer (house_id . house)

I haven't yet profiled this but eventually. I'd expect one must write my_d_by_h = dogs_by_houses my_dogs and call my_d_by_h to benefit from the indexing.

I would build the table with

Vec.accumulate (:) (Vec.replicate maxHouse []) 
  (Vec.map (\ d -> (dog_house d, d)) dogs)

which will definitely allocate at most one intermediate vector, and I suspect it might be smart enough not to allocate any intermediate vectors at all.

I once got caught with a nasty gotcha doing something like this. I was using Data.Map.Map as a lookup table, but the principle was the same. My function took a list of key-value pairs, constructed a Map, and returned the lookup function. It went something like this:

makeTable :: [(Key, Value)] -> Key -> Value
makeTable pairs = ((fromList pairs) !)

It seemed obvious to me that I could then write something like

myTable = makeTable [("foo", fooValue), ("bar", barValue)  ... and so on]

Then I could have an O(log N) lookup by saying

v = myTable "foo"

However what GHC actually did was to rebuild the entire Map from the list for every call. When you create a partial application in this way GHC doesn't try to figure out which values it can derive from the arguments its got, it just stores the raw arguments and does the entire function for every call. Perfectly reasonable behaviour, but not what I wanted.

What I had to write instead was this:

makeTable pairs = \k -> table ! k
   where table = fromList pairs

I imagine you will have to do the same thing.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM