简体   繁体   中英

Lookup function for large list of floats - memoized computation?

I need to write a lookup function for a large list (float*float). This function should add a new entry if key is not found or sum values if key is found. I have read about memoized computations and actually it wasnt really that hard to do. Here is what I have:

let memoLookUp basearr lookarr =
    let t = new System.Collections.Generic.Dictionary<float,float>()
    for (a,b) in basearr do
        t.Add(a,b)
    for (a, b) in lookarr do
        if t.ContainsKey(a) then t.[a] <- t.[a] + b
        else t.Add(a,b)
    t

Sample data:

let basearr = [(41554., 10.0) ; (41555., 11.0) ; (41556., 12.0) ; (41557., 10.0) ; (41558., 13.0) ]

let lookarr = [(41555., 14.0) ; (41556., 15.0) ; (41559., 16.0)]

This returns as expected.

My questions are:

  • if the lists are long (say about 30000 each) is it sensible to do this this way from a performance point of view?
  • Or would it be better to sort by the date (in column one of each data list) and then use a more imperative approach?
  • Or is there even sth build in in f# or c#?

Your existing code might usefully merge the two arrays to have a more uniform behaviour. unless otherwise needed, (for instance, you want the program to crash if basearr contains duplicate) uniform is better

let incrementalAdderImperative aseq = 
  let d= System.Collections.Generic.Dictionary<_,_>()
  Seq.iter(fun (k,v) ->  if d.ContainsKey(k) 
                         then d.[k] <- d.[k] + v
                         else d.Add(k,v)) aseq

To answer your questions :

  • if the lists are long (say about 30000 each) is it sensible to do this this way from a performance point of view?

You are using hash based dictionary, by relying on the Dictionary class. so it should not degrade at all. Note that this is a property of this implementation of dictionaries, not of the functionality of dictionaries, described in IDictionary. there are other implementations (for instance Map)

If you are concerned about performance, you should initialize your dictionary with a (fast) estimate of how many keys will happen to avoid internal resizing. and know the concrete types used (like a Hash-based dictionary, etc..)

  • would it be better to sort by the date (in column one of each data list) and then use a more imperative approach?

If you sorted by the date, you could do a fold. I think this would be faster, but the number you mention are not that big.

let oneshotAdder reducer kvArr =
    kvArr |> Array.sortInPlaceBy fst
    let a = kvArr 
            |> Array.fold(fun (res) (k,v) ->  
                            match res with
                            | []                             -> (k,v)::res
                            | ((prevk,_)::xs) when k = prevk -> (k,reducer v (List.head res |> snd))::(List.tail res)
                            | _                              -> (k,v)::res)
                          List.empty
    dict a
let data = Array.concat ([basearr; lookarr] |> List.map List.toArray)
let dict2 = oneshotAdder (+) data

ps : in the example you give, basearr and lookarr are lists, not arrays, hence the extraneous operation assuming you indeed want to operate on arrays.

  • is there even sth build in in f# or c#?

In F#, you can do natively a groupby and them sum elements. the essence of collection transform is to pass functions around, so it is no surpise to have it natively. In C#, you can use Linq to get such enumeration transforms, which under the hood map to some functions like in fsharp.

let groupByAdder reducer (kvArr:('k*'v) array)  =
    kvArr |> Seq.groupBy fst 
          |> Seq.map (fun (k,vs) -> k , vs |> Seq.map snd |> (Seq.reduce reducer)) 
          |> dict
let dict3 = groupByAdder (+) data 

I would do:

Seq.groupBy fst kvs
|> Seq.map (fun (k, vs) -> k, Seq.map snd vs |> Seq.reduce (+))
|> dict

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM