简体   繁体   English

在 F# 中批量添加到 map

[英]bulk adding to a map, in F#

I've a simple type:我有一个简单的类型:

type Token =
    {
        Symbol:     string
        Address:    string
        Decimals:   int
    }

and a memory cache (they're in a db):和一个 memory 缓存(它们在数据库中):

let mutable private tokenCache : Map<string, Token> = Map.empty

part of the Tokens module.令牌模块的一部分。

Sometimes I get a few new entries to add, in the form of a Token array , and I want to update the cache.有时我会以Token array的形式添加一些新条目,并且我想更新缓存。

It happens very rarely (less than once per million reads).它很少发生(每百万次读取不到一次)。

When I update the database with the new batch, I want to update the cache map as well and I just wrote this:当我用新批次更新数据库时,我也想更新缓存 map,我刚刚写了这个:

tokenCache <- tokens |> Seq.fold (fun m i -> m.Add(i.Symbol, i)) tokenCache

Since this is happening rarely, I don't really care about the performance so this question is out of curiosity:由于这种情况很少发生,我并不真正关心性能,所以这个问题是出于好奇:

When I do this, the map will be recreated once per entry in the tokens array: 10 new tokens, 10 map re-creation.当我这样做时,map 将在令牌数组中的每个条目重新创建一次:10 个新令牌,10 个 map 重新创建。 I assumed this was the most 'F#' way to deal with this.我认为这是处理这个问题的最“F#”方式。 It got me thinking: wouldn't converting the map to a list of KVP, getting the output of distinct and re-creating a map be more efficient?这让我想到:将 map 转换为 KVP 列表,获得不同的 output 并重新创建 map 不是更有效吗? or is there another method I haven't thought about?还是有另一种我没有想到的方法?

This is not an answer to the question as stated, but a clarification to something you asked in the comments.这不是对所述问题的回答,而是对您在评论中提出的问题的澄清。

This premise that you have expressed is incorrect:你表达的这个前提是不正确的:

the map will be recreated once per entry in the tokens array map 将在令牌数组中的每个条目重新创建一次

The map doesn't actually get completely recreated for every insertion. map 实际上并没有为每次插入完全重新创建。 But at the same time, another hypothesis that you have expressed in the comments is also incorrect:但同时,您在评论中表达的另一个假设也是不正确的:

so the immutability is from the language's perspective, the compiler doesn't recreate the object behind the scenes?所以不变性是从语言的角度来看的,编译器不会在幕后重新创建 object?

Immutability is real.不变性是真实的。 But the map also doesn't get recreated every time.但是 map 也不会每次都重新创建。 Sometimes it does, but not every time.有时确实如此,但并非每次都如此。

I'm not going to describe exactly how Map works, because that's too involved.我不会准确描述Map的工作原理,因为这太复杂了。 Instead, I'll illustrate the principle on a list.相反,我将在列表中说明原理。


F# lists are "singly linked lists", which means each list consists of two things: (1) first element (called "head") and (2) a reference (pointer) to the rest of elements (called "tail"). F# 列表是“单链表”,这意味着每个列表包含两件事:(1)第一个元素(称为“头”)和(2)对 rest 元素(称为“尾”)的引用(指针)。 The crucial thing to note here is that the "rest of elements" part is also itself a list.这里要注意的关键是“其余元素”部分本身也是一个列表。

So if you declare a list like this:因此,如果您声明这样的列表:

let x = [1; 2; 3]

It would be represented in memory something like this:它将在 memory 中表示,如下所示:

x -> 1 -> 2 -> 3 -> []

The name x is a reference to the first element, and then each element has a reference to the next one, and the last one - to empty list.名称x是对第一个元素的引用,然后每个元素都有对下一个元素的引用,最后一个 - 对空列表的引用。 So far so good.到目前为止,一切都很好。

Now let's see what happens if you add a new element to this list:现在让我们看看如果向这个列表中添加一个新元素会发生什么:

let y = 42 :: x

Now the list y will be represented like this:现在列表y将表示如下:

y -> 42 -> 1 -> 2 -> 3 -> []

But this picture is missing half the picture.但这幅画少了一半。 If we look at the memory in a wider scope than just y , we'll see this:如果我们在比y更宽的 scope 中查看 memory ,我们会看到:

    x -> 1 -> 2 -> 3 -> []
         ^
         |
        /
y ->  42

So you see that the y list consists of two things (as all lists do): first element 42 and a reference to the rest of the elements 1->2->3 .因此,您会看到y列表包含两件事(就像所有列表一样):第一个元素42和对元素1->2->3的 rest 的引用。 But the "rest of the elements" bit is not exclusive to y , it has its own name x .但是“其余元素”位不是y独有的,它有自己的名称x

And so it is that you have two lists x and y , 3 and 4 elements respectively, but together they occupy just 4 cells of memory, not 7.因此,您有两个列表xy ,分别是 3 和 4 个元素,但它们一起只占用 memory 的 4 个单元格,而不是 7 个。

And another thing to note is that when I created the y list, I did not have to recreate the whole list from scratch, I did not have to copy 1 , 2 , and 3 from x to y .还有一点需要注意的是,当我创建y列表时,我不必从头开始重新创建整个列表,我不必将123x复制到y Those cells stayed right where they are, and y only got a reference to them.这些单元格就在它们所在的位置,而y只得到了对它们的引用。

And a third thing to note is that this means that prepending an element to a list is an O(1) operation.第三点要注意的是,这意味着将元素添加到列表中是一个 O(1) 操作。 No copying of the list involved.没有复制所涉及的清单。

And a fourth (and hopefully final) thing to note is that this approach is only possible because of immutability .第四点(希望是最后的)要注意的是,这种方法只有在不变性的情况下才有可能。 It is only because I know that the x list will never change that I can take a reference to it.只是因为我知道x列表永远不会改变,所以我可以参考它。 If it was subject to change, I would be forced to copy it just in case.如果它可能发生变化,我将被迫复制它以防万一。


This sort of arrangement, where each iteration of a data structure is built "on top of" the previous one is called " persistent data structure " (well, to be more precise, it's one kind of a persistent data structure).这种安排,其中数据结构的每次迭代都构建在前一个“之上”,称为“持久数据结构”(嗯,更准确地说,它是一种持久数据结构)。

The way it works is very easy to see for linked lists, but it also works for more involved data structures, including maps (which are represented as trees).它的工作方式对于链表很容易看出,但它也适用于更多涉及的数据结构,包括地图(表示为树)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM