两个堆分配是否比调用 std::string fill ctor 更昂贵？

Question

I want to have a string with a capacity of 131 chars (or bytes).我想要一个容量为 131 个字符（或字节）的字符串。 I know two simple ways of achieving that.我知道实现这一目标的两种简单方法。 So which of these two code blocks is faster and more efficient?那么这两个代码块中哪个更快更高效呢？

std::string tempMsg( 131, '\0' ); // constructs the string with a 131 byte buffer from the start
tempMsg.clear( ); // clears those '\0' chars to free space for the actual data

tempMsg += "/* some string literals that are appended */";

or this one:或者这个：

std::string tempMsg; // default constructs the string with a 16 byte buffer
tempMsg.reserve( 131 ); // reallocates the string to increase the buffer size to 131 bytes??

tempMsg += "/* some string literals that are appended */";

I guess the first approach only uses 1 allocation and then sets all those 131 bytes to 0 ('\\0') and then clears the string (std::string::clear is generally constant according to: https://www.cplusplus.com/reference/string/string/clear/ ).我猜第一种方法只使用 1 个分配，然后将所有 131 个字节设置为 0（'\\0'），然后清除字符串（std::string::clear通常是常量，根据： https://www.cplusplus .com/reference/string/string/clear/ ）。 The second approach uses 2 allocations but on the other hand, it doesn't have to set anything to '\\0'.第二种方法使用 2 次分配，但另一方面，它不必将任何内容设置为 '\\0'。 But I've also heard about compilers allocating 16 bytes on the stack for a string object for optimization purposes.但我也听说过编译器会在堆栈上为字符串对象分配 16 个字节以用于优化目的。 So the 2nd method might use only 1 heap allocation as well.所以第二种方法也可能只使用 1 个堆分配。

So is the first method faster than the other one?那么第一种方法比另一种方法快吗？ Or are there any other better methods?或者还有其他更好的方法吗？

Answer 1

The most accurate answer is that it depends.最准确的答案是视情况而定。 The most probable answer is the second being faster or as fast.最可能的答案是第二个更快或同样快。 Calling the fill ctor requires not only a heap allocation but a fill (typically translates to a memset in my experience).调用填充构造memset不仅需要堆分配，还需要填充（根据我的经验，通常转换为memset ）。

clear usually won't do anything with a POD char besides setting a first pointer or size integer to zero because char is a trivially-destructible type.除了将第一个指针或大小整数设置为零之外， clear通常不会对 POD char 做任何事情，因为 char 是一种可简单破坏的类型。 There's no loop involved with clear usually unless you create std::basic_string with a non-trivial UDT.除非您使用非平凡的 UDT 创建 std::basic_string ，否则通常不涉及clear循环。 ~~It's constant-time otherwise and dirt-cheap in practically every standard library implementation.~~ ~~在几乎每个标准库实现中，它都是恒定时间的并且非常便宜。~~

Edit: An Important Note: I never encountered a standard lib implementation that does this or it has slipped my memory (very possible as I think I'm turning senile), but there is something very important that Viktor Sehl pointed out to me that I was very ignorant about in the comments:编辑：重要说明：我从未遇到过这样做的标准库实现，或者它已经让我忘记了（很有可能，因为我认为我正在变老），但是 Viktor Sehl 向我指出了一些非常重要的事情在评论中非常无知：

Please note that std::string::clear() on some implementations free the allocated memory (if there are any), unlike a std::vector.请注意，某些实现中的 std::string::clear() 会释放分配的内存（如果有的话），这与 std::vector 不同。 – ——

That would actually make your first version involve two heap allocations.这实际上会使您的第一个版本涉及两个堆分配。 But the second should still only be one (opposite of what you thought).但是第二个应该仍然只有一个（与您的想法相反）。

Resumed:恢复：

But I've also heard about compilers allocating 16 bytes on the stack for a string object for optimization purposes.但我也听说过编译器会在堆栈上为字符串对象分配 16 个字节以用于优化目的。 So the 2nd method might use only 1 heap allocation as well.所以第二种方法也可能只使用 1 个堆分配。

The first allocation is a small-buffer stack optimization for implementations that use it (technically not always stack, but it'll avoid additional heap allocations).第一个分配是针对使用它的实现的小缓冲区堆栈优化（技术上并不总是堆栈，但它会避免额外的堆分配）。 It's not separately heap-allocated and you can't avoid it with a fill ctor (the fill ctor will still allocate the small buffer).它不是单独的堆分配，您无法使用填充构造函数避免它（填充构造函数仍将分配小缓冲区）。 What you can avoid is filling the entire array with '\\0 ' before you fill it with what you actually want, and that's why the second version is likely faster (marginally or not depending on how many times you invoke it from a loop).你可以避免的是在你用你真正想要的东西填充它之前用'\\0 '填充整个数组，这就是为什么第二个版本可能更快（稍微取决于你从循环中调用它的次数）。 That's needless overhead unless the optimizer eliminates it for you, and it's unlikely in my experience that optimizers will do that in loopy cases that can't be optimized with something like SSA.这是不必要的开销，除非优化器为您消除它，根据我的经验，优化器不太可能在无法使用 SSA 之类的东西优化的循环情况下这样做。

I just pitched in here because your second version is also clearer in intent than filling a string with something as an attempted optimization (in this case a very misguided one if you ask me) only to throw it out and replace it with what you actually want.我只是在这里投球，因为你的第二个版本的意图也比用一些东西填充字符串作为尝试的优化（在这种情况下，如果你问我的话，这是一个非常误导的一个）的意图更清晰，只是把它扔掉并用你真正想要的替换它. The second is at least clearer in intent and almost certainly as fast or faster in most implementations.第二个至少在意图上更清晰，并且在大多数实现中几乎可以肯定更快或更快。 I would suggest always measuring though if in doubt, and especially before you start attempting funny things like in your first example.如果有疑问，我建议始终进行测量，尤其是在您开始尝试像第一个示例中那样有趣的事情之前。 I can't recommend the profiler enough if you're in working in performance-critical fields.如果您在性能关键领域工作，我不能充分推荐分析器。 The profiler will not only answer this question for you but it'll also teach you to refrain from writing such counter-intuitive code like in the first example except in places where it makes a real positive difference (in this case I think the difference is actually negative or neutral).分析器不仅会为您回答这个问题，而且还会教您不要像第一个示例中那样编写这种违反直觉的代码，除非在它产生真正积极影响的地方（在这种情况下，我认为区别是实际上是负面的或中性的）。

If the small buffer optimization confuses you a bit, a simple illustration is like this:如果小缓冲区优化让你有点困惑，一个简单的例子是这样的：

struct SomeString
{
    // Pre-allocates (always) some memory in advance to avoid additional
    // heap allocs.
    char small_buffer[some_small_fixed_size] = {};

    // Will point to small buffer until string gets large.
    char* ptr = small_buffer;
};

The allocation of the small buffer is unavoidable, but it doesn't require separate calls to malloc/new/new[] .小缓冲区的分配是不可避免的，但它不需要单独调用malloc/new/new[] 。 And it's not allocated separately on the heap from the string object itself (if it is allocated on heap).并且它不是从字符串对象本身在堆上单独分配的（如果它是在堆上分配的）。 So both of the examples that you showed involve, at most, a single heap allocation ( ~~unless your standard library implementation is FUBAR~~ -- edit: or one that Viktor is using ).因此，您展示的两个示例最多只涉及单个堆分配（ ~~除非您的标准库实现是 FUBAR~~ --编辑：或 Viktor 正在使用的）。 What the first example has conceptually on top of that is a fill/loop (could be implemented as a very efficient intrinsic in assembly but loopy/linear time stuff nevertheless) unless the optimizer eliminates it.第一个示例在概念上最重要的是填充/循环（可以在汇编中实现为非常有效的内在函数，但仍然是循环/线性时间的东西），除非优化器将其消除。

So is the first method faster than the other one?那么第一种方法比另一种方法快吗？ Or are there any other better methods?或者还有其他更好的方法吗？

I might upset some C++ devs but you can write your own string type which uses an SBO with, say, 256 bytes for the small buffer.我可能会让一些 C++ 开发人员感到不安，但您可以编写自己的字符串类型，该类型使用 SBO，例如，256 字节用于小缓冲区。 Then you can avoid heap allocations entirely for your 131-length case.然后，您可以完全避免为 131 长度的情况分配堆。 That would be ill-suited for persistent storage though because it would blow up memory use (ex: requiring 256+ bytes just to store a string with one character in it).但是，这不适合持久存储，因为它会占用内存（例如：需要 256 多个字节来存储一个包含一个字符的字符串）。 It's well-suited for temporary strings though.不过它非常适合临时字符串。 I'm primarily a gamedev.我主要是一个游戏开发者。 Rolling our own alternatives to the standard C++ library is quite normal here given our requirements for real-time feedback with high graphical fidelity.考虑到我们对具有高图形保真度的实时反馈的要求，在这里推出我们自己的标准 C++ 库替代品是很正常的。 I wouldn't recommend it for the faint-hearted though, and definitely not without a profiler.不过，我不会向胆小的人推荐它，而且绝对不会在没有分析器的情况下推荐它。 This is a very practical and viable option in my field.在我的领域，这是一个非常实用和可行的选择。 It might be ridiculous in yours.在你看来，这可能是荒谬的。 The standard lib is excellent but it's tailored for the needs of the entire world.标准库非常出色，但它是为满足全世界的需求而量身定制的。 You can usually beat it if you can tailor your code very specifically to your needs and produce more narrowly-applicable code.如果您可以非常具体地根据您的需要定制代码并生成适用范围更窄的代码，那么您通常可以击败它。

Actually std::string with SBOs is ill-suited for persistent storage anyway because if you store like std::unordered_map<std::string, T> and std::string uses a 16-byte SBO inflating sizeof(std::string) to 32 bytes or more, then your keys will require 32 bytes even if they just store one character fitting only two strings or less in a single cache line on traversal of the hash table.实际上，带有 SBO 的std::string无论如何都不适合持久存储，因为如果您像std::unordered_map<std::string, T>并且std::string使用 16 字节 SBO inflating sizeof(std::string)到 32 个字节或更多，那么您的键将需要 32 个字节，即使它们在遍历哈希表时仅在单个缓存行中存储一个仅适合两个或更少字符串的字符。 That's a downside to using SBOs.这是使用 SBO 的缺点。 They can blow up your memory use for persistent storage that's part of your application state.它们可能会占用您的内存用于持久存储，这是您的应用程序状态的一部分。 BUt they're excellent for temporaries whose memory is just pushed and popped to/from stack in a LIFO alloc/dealloc pattern.但是它们非常适合那些在 LIFO alloc/dealloc 模式中只是将内存推入堆栈或从堆栈中弹出的临时对象。

两个堆分配是否比调用 std::string fill ctor 更昂贵？

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-11-10 17:29:24

两个堆分配是否比调用 std::string fill ctor 更昂贵？

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-11-10 17:29:24

解决方案1
1 已采纳 2021-11-10 17:29:24