[英]reusing std::unordered_map efficiently
I manage relatively small transient dictionaries in my program.我在我的程序中管理相对较小的临时字典。 My question: is it significantly more efficient to reuse them (with
mymap.clear()
after use), rather than to delete
the old ones and create new
ones?我的问题:重用它们(使用后使用
mymap.clear()
)比delete
旧的并创建new
的更有效吗?
Also, these dictionaries are currently implemented as std::unordered_map<std::string, int>
.此外,这些字典目前实现为
std::unordered_map<std::string, int>
。 This works, but if (in light of the above usage pattern) another container ( stl or not) is preferrable, I won't hesitate to switch this implementation.这是有效的,但如果(根据上述使用模式)另一个容器( stl或不是)更可取,我会毫不犹豫地切换此实现。
Did you profile it?你配置了吗? Because right now it's just a lot of guesswork.
因为现在这只是很多猜测。
Consider that new
and delete
on the std::unordered_map
just add the overhead of instanciating / tearing down of the container itself.考虑
std::unordered_map
上的new
和delete
只是增加了实例化/拆除容器本身的开销。 std::unordered_map::clear
internally will still call delete
on every object it holds, so that it's destructor is invoked. std::unordered_map::clear
内部仍然会在它持有的每个对象上调用delete
,以便调用它的析构函数。 There might be a fancy allocator involved, that implements a pool of identically sized slots for the container elements to save on the memory management overhead.可能涉及到一个奇特的分配器,它为容器元素实现了一个相同大小的槽池,以节省内存管理开销。
Depending on the complexity of the contained objects it may, or may not be more sensible, to use a plain std::vector
根据所包含对象的复杂性,使用普通的
std::vector
可能更明智,也可能更明智
You'll have to profile where your overhead is.你必须分析你的开销在哪里。 But more importantly, only go through the work, if this is a part of your program that causes statistically significant slowdown.
但更重要的是,只有完成工作,如果这是你的程序的一部分,导致统计上显着放缓。 You should choose ease of readability and implementation clarity above micro optimizations.
您应该选择易于阅读和实现清晰而不是微优化。
Unfortunately, there isn't any performance-advantage to .clear()
and reuse over just getting a new node-based container, it's nearly the same amount of work.不幸的是,
.clear()
没有任何性能优势,重用只是获得一个新的基于节点的容器,它的工作量几乎相同。
If you know the maximum size of your dictionary, and it is reasonably small, consider using a custom allocator for the nodes.如果您知道字典的最大大小,并且它相当小,请考虑为节点使用自定义分配器。
That way, you might get things more compact and save on allocation overhead.这样,您可能会使事情变得更紧凑并节省分配开销。
Aside from that, other containers which avoid allocating thousands of individual nodes outside the standard library are a possibility.除此之外,避免在标准库之外分配数千个单独节点的其他容器也是可能的。
This works, but if (in light of the above usage pattern) another container (stl or not) is preferrable, I won't hesitate to switch this implementation.
这是可行的,但如果(根据上述使用模式)另一个容器(stl 或不是)更可取,我会毫不犹豫地切换此实现。
Ok choice for the start.好的选择开始。 If you wanna try something else:
如果你想尝试别的东西:
Measure performance on real scenario with real data to see if alternatives are worth using.使用真实数据衡量真实场景中的性能,以查看替代方案是否值得使用。
For GCC at least, std::unordered_map<std::string, int>
, at any point in time, has dynamic allocations as follows:至少对于 GCC,
std::unordered_map<std::string, int>
在任何时间点都有如下动态分配:
std::string
and int
data std::string
和int
数据std::string
too long for the Short String Optimisation (where text content is stored directly in the std::string
object), will have a pointer to a dynamically allocated text buffer std::string
对于短字符串优化(文本内容直接存储在std::string
对象中)来说太长,将有一个指向动态分配的文本缓冲区的指针When you do a .clear()
the latter two categories of allocations are deallocated.当您执行
.clear()
,后两类分配将被释放。 When the container itself is destructed, only one extra deallocation is done.当容器本身被破坏时,只完成一次额外的释放。
So, I wouldn't expect much performance improvement from keeping the unordered_map
s around.所以,我不希望保留
unordered_map
很大的性能改进。
If you care about performance, look more carefully at your data.如果您关心性能,请更仔细地查看您的数据。 Is there an upper bound to string length?
字符串长度有上限吗? If there is and it's not large (eg 8 or 16 bytes), you could grab a hash table using open-addressing aka closed-hashing where the keys and values are stored directly in the buckets, so there's just one dynamic allocation going on.
如果有并且它不是很大(例如 8 或 16 字节),您可以使用开放寻址又名封闭散列获取一个哈希表,其中键和值直接存储在存储桶中,因此只有一个动态分配正在进行。 That could be expected to give you a large performance improvement (but always measure).
这可能会给你带来很大的性能提升(但总是衡量)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.