I have an unordered_map
storing large objects:
std::unordered_map<uint64_t, LargeObject> map;
LargeObject
is an array of POD/there are no pointer members.
I am receiving a lot of LargeObject
s, so I only want to insert them in the map if they don't already exist (there will be a lot of times they do already exist, as I am listening to multiple sources).
Performance matters, as I am receiving millions.
There seem to be several ways of doing this:
map[key] = largeObject;
or:
map.insert(std::make_pair<uint64_t, LargeObject>(key, largeObject);
or:
if(map.count(key) == 0)
{
map[key] = largeObject;
}
or:
auto iter = map.find(key);
if(iter == map.end())
{
map[key] = largeObject;
}
There maybe more I haven't included.
Which is the most efficient technique to insert when the map's value is a large object?
This is what try_emplace
is for. The API is set up in such a way as to avoid constructing nodes, or even the value, unless it's strictly necessary. It comes form N4279 .
In your case, that would be:
auto [it, success] = map.try_emplace(key, largeObject);
Each of the four options in the OP has issues:
map[key] = largeObject
doesn't actually do what you're asking for, it would overwrite the existing item. And even if it wasn't there, it requires default construction and copy assignment.
The approaches with count
and find
both require two lookups.
map.insert(std::make_pair<uint64_t, LargeObject>(key, largeObject));
is a single lookup but requires constructing the large object, and the pair, unconditionally.
Not mentioned in the OP is another option: map.emplace(key, largeObject);
This has the issue that it's actually under-specified whether or not the pair
is created in the case that the key
exists. It does on some implementations. The motivation for try_emplace
was to properly specify the API so that the pair
definitely does not get created if the key
already exists.
If "Performance matters", and the map is large don't use std::unordered_map
, since it is known to be quite slow ; and you are barking up the wrong tree w.r.t. insertion.
Here's a by-now-old SO question suggesting several alternatives:
Super high performance C/C++ hash map (table, dictionary)
Of course, you will need to choose something that fits your workload best.
Additionally - if your objects are large, it is probably better to avoid a map structure templated over the actual large object - since that usually guarantees you will at least have to copy it when inserting into the map;and you don't want to have to copy large objects. Use some kind of reference type (eg a pointer, or some numeric index usable to get the address of the large object), and have your hasher dereference that.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.