字符串向量与字符串指针向量的STL排序性能比较

Question

I tried to compare the performance of STL sort on vector of strings and vector of pointers to strings. 我试图比较字符串向量和字符串指针向量的STL排序的性能。

I expected the pointers version to outperform, but the actual results for 5 million randomly generated strings are 我预计指针版本会跑赢大盘，但500万随机生成的字符串的实际结果是

vector of strings : 12.06 seconds 字符串向量：12.06秒
vector of pointers to strings : 16.75 seconds 指向字符串的向量：16.75秒

What explains this behavior? 什么解释了这种行为？ I expected swapping pointers to strings should be faster than swapping string objects. 我期望交换指向字符串的指针应该比交换字符串对象更快。

The 5 million strings were generated by converting random integers. 通过转换随机整数生成500万个字符串。
Compiled with (gcc 4.9.3): g++ -std=c++11 -Wall 用（gcc 4.9.3）编译： g++ -std=c++11 -Wall
CPU: Xeon X5650 CPU：Xeon X5650

// sort vector of strings
 int main(int argc, char *argv[])
    {
      const int numElements=5000000;
      srand(time(NULL));
      vector<string> vec(numElements);

      for (int i = 0; i < numElements; i++)
            vec[i] = std::to_string(rand() % numElements);

      unsigned before = clock();

      sort(vec.begin(), vec.end());

      cout<< "Time to sort: " << clock() - before << endl;

       for (int i = 0; i < numElements; i++)
         cout << vec[i] << endl;

      return 0;
    }



// sort vector of pointers to strings
    bool comparePtrToString (string *s1, string *s2)
    {
      return (*s1 < *s2);
    }

    int main(int argc, char *argv[])
    {
      const int numElements=5000000;
      srand(time(NULL));
      vector<string *> vec(numElements);

      for (int i = 0; i < numElements; i++)
            vec[i] = new string( to_string(rand() % numElements));

      unsigned before = clock();

      sort(vec.begin(), vec.end(), comparePtrToString);

      cout<< "Time to sort: " << clock() - before << endl;

       for (int i = 0; i < numElements; i++)
         cout << *vec[i] << endl;

      return 0;
    }

Answer 1

This is because all the operations that sort performs on strings is moves and swaps. 这是因为对strings执行sort所有操作都是移动和交换。 Both move and swap for an std::string are constant time operations, meaning that they only involve changing some pointers. 移动和交换std::string都是常量时间操作，这意味着它们只涉及更改一些指针。

Therefore, for both sorts moving of the data has the same performance overhead. 因此，对于这两种类型的数据移动具有相同的性能开销。 However, in case of pointers to strings you pay some extra cost to dereference the pointers on each comparison, which causes it to be noticeably slower. 但是，在指向字符串的情况下，您需要支付一些额外的费用来取消引用每个比较的指针，这会导致它明显变慢。

Answer 2

In the first case the internal pointers to representations of the strings are swapped and not the complete data copied. 在第一种情况下，交换字符串表示的内部指针，而不是复制的完整数据。

You should not expect any benefit from the implementation with pointers, which in fact is slower, since the pointers have to be dereferenced additionally, to perform the comparison. 您不应期望使用指针实现任何好处，实际上速度较慢，因为指针必须另外解除引用 ，以执行比较。

Answer 3

What explains this behavior? 什么解释了这种行为？ I expected swapping pointers to strings should be faster than swapping string objects. 我期望交换指向字符串的指针应该比交换字符串对象更快。

There's various things going on here which could impact performance. 这里有各种各样的事情会影响性能。

Swapping is relatively cheap both ways. 交换相对便宜两种方式。 Swapping strings tends to always be a shallow operation (just swapping PODs like pointers and integrals) for large strings and possibly deep for small strings (but still quite cheap -- implementation-dependent). 对于大字符串，交换字符串往往是一个浅操作（只是交换像指针和积分的POD），对于小字符串来说可能很深（但仍然非常便宜 - 依赖于实现）。 So swapping strings tends to be pretty cheap overall, and typically not much more expensive than simply swapping pointers to them*. 因此，交换字符串整体上往往相当便宜，并且通常不比简单地交换指针更昂贵*。
[ sizeof(string) is certanly bigger than sizeof(string*) , but it's not an astronomical difference basically as the operation still occurs in constant-time, and quite a bit cheaper in this context when the string fields already have to be fetched into a faster form of memory for the comparator, giving us temporal locality with respect to its fields.] [ sizeof(string)比sizeof(string*)大得多，但它并不是天文上的差异，因为操作仍然在恒定时间内发生，并且在这种情况下，当字符串字段必须被提取到时，相当便宜一些比较器的更快记忆形式，为我们提供了与其字段相关的时间局部性。
String contents must be accessed anyway both ways. 无论如何都必须以两种方式访问字符串内容。 Even the pointer version of your comparator has to examine the string contents (including the fields designating size and capacity ). 甚至比较器的指针版本也必须检查字符串内容（包括指定size和capacity的字段）。 As a result, we end up paying the memory cost of fetching the data for the string contents regardless. 结果，我们最终支付了为字符串内容获取数据的内存成本。 Naturally if you just sorted the strings by pointer address (ex: without using a comparator) instead of a lexicographical comparison of the string contents, the performance edge should shift towards the pointer version since that would reduce the amount of data accessed considerably while improving spatial locality (more pointers can fit in a cache line than strings, eg). 当然，如果您只是通过指针地址对字符串进行排序（例如：不使用比较器）而不是字符串内容的字典比较，则性能边缘应该转向指针版本，因为这样可以大大减少访问的数据量，同时改善空间locality（更多的指针可以放在缓存行中而不是字符串，例如）。
The pointer version is scattering (or at least increasing the stride of) the string fields in memory. 指针版本散布（或至少增加了内存中字符串字段的步幅）。 For the pointer version, you're allocating each string on the free store (in addition to the string contents which may or may not be allocated on the free store). 对于指针版本，您要在免费商店中分配每个字符串（除了可以在免费商店上分配或不分配的字符串内容）。 That can disperse the memory and reduce locality of reference, so you're potentially incurring a greater cost in the comparator that way with increased cache misses. 这可以分散内存并减少引用的局部性，因此在比较器中可能会因增加缓存未命中而导致更高的成本。 Even if a sequential allocation of this sort results in a very contiguous set of pages being allocated (ideal scenario), the stride to get from one string's fields to the next would tend to get at least a little larger because of the allocation metadata/alignment overhead (not all allocators require metadata to be stored directly in a chunk, but typically they will at least add some small overhead to the chunk size). 即使这种顺序的顺序分配导致分配一组非常连续的页面（理想情况），由于分配元数据/对齐，从一个字符串的字段到下一个字段的步幅往往会变得更小。开销（并非所有分配器都要求元数据直接存储在块中，但通常它们至少会为块大小增加一些小开销）。
It might be simpler to attribute this to the cost of dereferencing the pointer but it's not so much the cost of the mov/load instruction doing the memory addressing that's expensive (in this relative context) as loading from slower/bigger forms of memory that aren't already cached/paged to faster, smaller memory. 将此归因于解除引用指针的成本可能更简单，但这并不是执行内存寻址的mov/load指令的代价，因为从较慢/较大形式的内存加载，这是昂贵的（在此相对上下文中）已经缓存/分页到更快，更小的内存。 Allocating each string individually on the free store will typically increase this cost whether it's due to a loss of contiguity or a larger constant stride between each string entry (in an ideal case). 在免费商店中单独分配每个字符串通常会增加此成本，无论是由于连续性丢失还是每个字符串条目之间的较大恒定步幅（在理想情况下）。
Even at a basic level without trying too hard to diagnose what's happening at the memory level, this increases the total size of the data that the machine has to look at (string contents/fields + pointer address) in addition to reduced locality/larger or variable strides (typically if you increase the amount of data accessed, it has to at least have improved locality to have a good chance of being beneficial). 即使在基本级别上也没有太难以诊断内存级别发生的事情，这会增加机器必须查看的数据的总大小（字符串内容/字段+指针地址）以及减少的局部性/更大或变量步幅（通常如果增加访问的数据量，它必须至少具有改进的局部性以便有可能获益）。 You might start to see more comparable times if you just sorted pointers to strings that were allocated contiguously (not in terms of the string contents which we have no control over, but just contiguous in terms of the adjacent string objects themselves -- effectively pointers to strings stored in an array). 如果您只是将指针排序到连续分配的字符串（不是根据我们无法控制的字符串内容，而是根据相邻字符串对象本身的连续性），您可能会开始看到更多可比较的时间 - 有效地指向存储在数组中的字符串）。 Then you'd get back the spatial locality at least for the string fields in addition to packing the data associated more tightly within a contiguous space. 然后，除了将相关联的数据打包在一个连续的空间内之外，至少还要为字符串字段返回空间局部性。

Swapping smaller data types like indices or pointers can sometimes offer a benefit but they typically need to avoid examining the original contents of the data they refer to or provide a significantly cheaper swap/move behavior (in this case string is already cheap and becomes cheaper in this context considering temporal locality) or both. 交换较小的数据类型（如索引或指针）有时可以提供好处，但它们通常需要避免检查它们引用的数据的原始内容，或者提供明显更便宜的交换/移动行为（在这种情况下，字符串已经很便宜并且变得更便宜了考虑时间局部性的这种情况或两者兼而有之。

Answer 4

Well, a std::string is typically about 3-4 times as big as a std::string* . 好吧， std::string通常是std::string* 3-4倍。
So just straight-up swapping two of the former shuffles that much more memory around. 所以，直接交换两个前洗牌，更多的记忆。

But that is dwarfed by the following effects: 但这与以下影响相形见绌：

Locality of reference. 参考地点。 You need to follow one more pointer to a random position to read the string. 您需要再跟随一个指向随机位置的指针来读取字符串。
More memory-usage: A pointer plus bookkeeping per allocation of each std::string . 更多的内存使用：每个std::string每个分配的指针加簿记。

Both put extra demand on caching, and the former cannot even be prefetched. 两者都对缓存提出了额外的要求，前者甚至无法预取。

Answer 5

Swaping containers change just container's content, in string case is the pointer to first character of string, not whole string. 交换容器只更改容器的内容，在字符串情况下是指向字符串的第一个字符的指针，而不是整个字符串。

In case vectors of pointers of strings you performed one additional step - casting pointers 如果是字符串指针的向量，则执行一个额外的步骤 - 转换指针

字符串向量与字符串指针向量的STL排序性能比较

问题描述

5 个解决方案

解决方案1
5 已采纳 2015-11-17 22:09:34

解决方案2
2 2015-11-17 22:08:51

解决方案3
2 2015-11-17 22:44:02

解决方案4
1 2015-11-17 22:57:05

解决方案5
0 2015-11-17 22:08:31

字符串向量与字符串指针向量的STL排序性能比较

问题描述

5 个解决方案

解决方案1 5 已采纳 2015-11-17 22:09:34

解决方案2 2 2015-11-17 22:08:51

解决方案3 2 2015-11-17 22:44:02

解决方案4 1 2015-11-17 22:57:05

解决方案5 0 2015-11-17 22:08:31

解决方案1
5 已采纳 2015-11-17 22:09:34

解决方案2
2 2015-11-17 22:08:51

解决方案3
2 2015-11-17 22:44:02

解决方案4
1 2015-11-17 22:57:05

解决方案5
0 2015-11-17 22:08:31