简体繁体 English

我会使用std :: map而不是vector来看到性能提升 <pair<string, string> >？

[英]Would I see a performance gain using std::map instead of vector<pair<string, string> >?

原文 2012-10-02 18:35:57 1 5 c++/ stl/ stdvector/ stdmap

I currently have some code where I am using a vector of pair<string,string> . 我目前有一些代码，我使用pair<string,string>的vector 。 This is used to store some data from XML parsing and as such, the process is quite slow in places. 这用于存储来自XML解析的一些数据，因此，该过程在某些地方非常慢。 In terms of trying to speed up the entire process I was wondering if there would be any performance advantage in switching from vector<pair<string,string> > to std::map<string,string> ? 在试图加快整个过程的过程中，我想知道从vector<pair<string,string> >到std::map<string,string>是否会有任何性能优势？ I could code it up and run a profiler, but I thought I would see if I could get an answer that suggests some obvious performance gain first. 我可以编写代码并运行一个分析器，但我想我会看到我是否能得到一个答案，表明首先会有一些明显的性能提升。 I am not required to do any sorting, I simply add items to the vector, then at a later stage iterate over the contents and do some processing - I have no need for sorting or anything of that nature. 我不需要进行任何排序，我只是将项添加到向量中，然后在稍后阶段迭代内容并进行一些处理 - 我不需要排序或任何这种性质。 I am guessing that perhaps I would not get any performance gain, but I have never actually used a std::map before so I don't know without asking or coding it all up. 我猜测也许我不会获得任何性能提升，但我之前从未实际使用过std::map所以我不知道如果没有要求或编码它。

5 个解决方案

No. If (as you say) you are simply iterating over the collection, you will see a small (probably not measurable) performance decrease by using a std::map . 不。如果（如你所说）你只是迭代集合，你会看到使用std::map一个小的（可能是不可测量的）性能下降。

Maps are for accessing a value by its key. 地图用于通过其键访问值。 If you never do this, map is a bad choice for a container. 如果你从不这样做，那么map对于容器来说是一个糟糕的选择。

If you are not modifying your vector<pair<string,string> > - just iterating it over and over - you will get perfomance degradation by using map . 如果你没有修改vector<pair<string,string> > - 只是反复迭代它 - 你将通过使用map降低性能。 This is because typical map is organized with binary tree of objects, each of which can be allocated in different memory blocks (unless you write own allocator). 这是因为典型的map是用二进制对象树组织的，每个对象都可以分配在不同的内存块中（除非你编写自己的分配器）。 Plus, each node of map manages pointers to neighbor objects, so it's time and memory overhead, too. 另外， map每个节点都管理指向邻居对象的指针，因此也是时间和内存开销。 But, search by key is O(log) operation. 但是，按键搜索是O（log）操作。 On other side, vector holds data in one block, so processor cache usually feels better with it. 另一方面， vector将数据保存在一个块中，因此处理器缓存通常会感觉更好。 Searching in vector is actually O(N) operation which is not so good but acceptable. 在向量中搜索实际上是O（N）操作，这不是很好但可以接受。 Search in sorted vector can be upgraded to O(log) using lower_bound etc functions. 可以使用lower_bound等函数将已排序的向量中的搜索升级到O（日志）。

It depends on operations you doing on this data. 这取决于您对此数据所做的操作。 If you make many searches - probably its better to use hashing container like unordered_map since search by key in this containers is O(1) operation. 如果你做了很多搜索 - 可能最好使用像unordered_map这样的散列容器，因为在这个容器中按键搜索是O（1）操作。 For iterating, as mentioned, vector is faster. 对于迭代，如上所述， vector更快。

Probably it is worth to replace string in your pair , but this highly depends on what you hold there and how access container. 可能值得替换你的pair string ，但这在很大程度上取决于你在那里持有什么以及如何访问容器。

The answer depends on what you are doing with these data structures and what the size of them is. 答案取决于您对这些数据结构的处理方式以及它们的大小。 If you have thousands of elements in your std::vector<std::pair<std::stringm std::string> > and you keep searching for the first element over and over, using a std::map<std::string, std::string> may improve the performance (you might want to consider using std::unordered_map<std::string, std::string> for this use case, instead). 如果你的std::vector<std::pair<std::stringm std::string> >有数千个元素，并且你一直在搜索第first元素，那么使用std::map<std::string, std::string>可以提高性能（您可能需要考虑使用std::unordered_map<std::string, std::string>来代替此用例）。 If your vectors are relatively small and you don't trying to insert elements into the middle too often, using vectors may very well be faster. 如果你的向量相对较小并且你不想过于频繁地将元素插入中间，那么使用向量可能会更快。 If you just iterate over the elements, vectors are a lot faster than maps: iterations isn't really one of their strength. 如果你只是迭代元素，矢量比地图快很多：迭代并不是他们的力量之一。 Maps are good at looking things up, assuming the number of elements isn't really small because otherwise a linear search over a vector is still faster. 地图擅长查找，假设元素的数量不是很小，因为否则对矢量的线性搜索仍然更快。

The best way to determine where the time is spent is to profile the code: it is often not entirely clear up front where the time is spent. 确定花费时间的最佳方法是对代码进行分析：在预先花费时间的情况下，通常并不完全清楚。 Frequently, the suspected hot-spots are actually non-problematic and other areas show unexpected performance problems. 通常，可疑的热点实际上没有问题，其他区域显示出意想不到的性能问题。 For example, you might be passing your objects my value rather than by reference at some obscure place. 例如，您可能会将对象传递给我的值，而不是通过引用传递给某个不起眼的地方。

如果您的使用模式在执行任何查找之前执行了许多插入，那么您可能会受益于实现“惰性”映射，其中元素按需排序（即，当您获取迭代器，执行查找等）。

As C++ say std::vector sort items in a linear memory, so first it allocate a memory block with an initial capacity and then when you want to insert new item into vector it will check if it has more room or not and if not it will allocate a new buffer with more space, copy construct all items into new buffer and then delete source buffer and set it to new one. 由于C ++在一个线性内存中说std::vector排序项，所以首先它分配一个具有初始容量的内存块，然后当你想要将新项插入vector时，它将检查它是否有更多空间，如果不是将分配一个具有更多空间的新缓冲区，将所有项目复制构造到新缓冲区中，然后删除源缓冲区并将其设置为新缓冲区。

When you just start inserting items into vector and you have lot of items you suffer from too many reallocation, copy construction and destructor call. 当你刚开始将项目插入vector并且你有很多项目时，你会遇到太多的重新分配，复制构造和析构函数调用。

In order to solve this problem, if you now count of input items (not exact but its usual length) you can reserve some memory for the vector and avoid reallocation and every thing. 为了解决这个问题，如果你现在计算输入项（不精确但通常的长度），你可以为向量reserve一些内存，避免重新分配和所有事情。 if you have no idea about the size you can use a collection like std::list witch never reallocate its internal items. 如果您不知道大小，可以使用像std::list这样的集合，永远不会重新分配其内部项目。