简体   繁体   中英

bulk adding to a map, in F#

I've a simple type:

type Token =
    {
        Symbol:     string
        Address:    string
        Decimals:   int
    }

and a memory cache (they're in a db):

let mutable private tokenCache : Map<string, Token> = Map.empty

part of the Tokens module.

Sometimes I get a few new entries to add, in the form of a Token array , and I want to update the cache.

It happens very rarely (less than once per million reads).

When I update the database with the new batch, I want to update the cache map as well and I just wrote this:

tokenCache <- tokens |> Seq.fold (fun m i -> m.Add(i.Symbol, i)) tokenCache

Since this is happening rarely, I don't really care about the performance so this question is out of curiosity:

When I do this, the map will be recreated once per entry in the tokens array: 10 new tokens, 10 map re-creation. I assumed this was the most 'F#' way to deal with this. It got me thinking: wouldn't converting the map to a list of KVP, getting the output of distinct and re-creating a map be more efficient? or is there another method I haven't thought about?

This is not an answer to the question as stated, but a clarification to something you asked in the comments.

This premise that you have expressed is incorrect:

the map will be recreated once per entry in the tokens array

The map doesn't actually get completely recreated for every insertion. But at the same time, another hypothesis that you have expressed in the comments is also incorrect:

so the immutability is from the language's perspective, the compiler doesn't recreate the object behind the scenes?

Immutability is real. But the map also doesn't get recreated every time. Sometimes it does, but not every time.

I'm not going to describe exactly how Map works, because that's too involved. Instead, I'll illustrate the principle on a list.


F# lists are "singly linked lists", which means each list consists of two things: (1) first element (called "head") and (2) a reference (pointer) to the rest of elements (called "tail"). The crucial thing to note here is that the "rest of elements" part is also itself a list.

So if you declare a list like this:

let x = [1; 2; 3]

It would be represented in memory something like this:

x -> 1 -> 2 -> 3 -> []

The name x is a reference to the first element, and then each element has a reference to the next one, and the last one - to empty list. So far so good.

Now let's see what happens if you add a new element to this list:

let y = 42 :: x

Now the list y will be represented like this:

y -> 42 -> 1 -> 2 -> 3 -> []

But this picture is missing half the picture. If we look at the memory in a wider scope than just y , we'll see this:

    x -> 1 -> 2 -> 3 -> []
         ^
         |
        /
y ->  42

So you see that the y list consists of two things (as all lists do): first element 42 and a reference to the rest of the elements 1->2->3 . But the "rest of the elements" bit is not exclusive to y , it has its own name x .

And so it is that you have two lists x and y , 3 and 4 elements respectively, but together they occupy just 4 cells of memory, not 7.

And another thing to note is that when I created the y list, I did not have to recreate the whole list from scratch, I did not have to copy 1 , 2 , and 3 from x to y . Those cells stayed right where they are, and y only got a reference to them.

And a third thing to note is that this means that prepending an element to a list is an O(1) operation. No copying of the list involved.

And a fourth (and hopefully final) thing to note is that this approach is only possible because of immutability . It is only because I know that the x list will never change that I can take a reference to it. If it was subject to change, I would be forced to copy it just in case.


This sort of arrangement, where each iteration of a data structure is built "on top of" the previous one is called " persistent data structure " (well, to be more precise, it's one kind of a persistent data structure).

The way it works is very easy to see for linked lists, but it also works for more involved data structures, including maps (which are represented as trees).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM