简体   繁体   English

std :: basic_string构造函数如何预先知道要保留多少空间?

[英]How does the std::basic_string constructor know beforehand how much space to reserve?

std::basic_string has the following constructor which initializes the string with the contents of the null-terminated string pointed to by s : std::basic_string具有以下构造函数,该构造函数使用s指向的以null结尾的字符串的内容初始化字符串:

std::basic_string(const CharT* s, const Allocator& alloc = Allocator());

But how does the constructor know beforehand how much space to reserve for the string in its internal buffer? 但是构造函数如何事先知道为其内部缓冲区中的字符串保留多少空间?

I could think of two methods: 我可以想到两种方法:

1) It could first go through the whole null-terminated string until it finds the first NULL character, remember how many characters it traversed, and use that as the capacity for its internal buffer and start copying. 1)它可以首先遍历整个以null结尾的字符串,直到它找到第一个NULL字符,记住它遍历了多少个字符,并将其用作其内部缓冲区的容量并开始复制。

Disadvantage : It has to read the string twice, once for counting the characters, a second time for copying the string. 缺点 :它必须读取字符串两次,一次用于计算字符,第二次用于复制字符串。

2) It could reserve a conservative amount in its internal buffer and just start copying. 2)它可以在其内部缓冲区中保留一个保守的数量,然后开始复制。 If it hits the NULL character before the buffer runs out, we're OK, otherwise we need to reserve more space (again by a conservative amount), and repeat the steps. 如果它在缓冲区用完之前遇到NULL字符,我们就可以了,否则我们需要保留更多空间(再次保守一定数量),然后重复这些步骤。

Disadvantage : If the string is fairly large, the overhead of constantly readjusting the capacity might become noticeable. 缺点 :如果字符串相当大,不断重新调整容量的开销可能会变得明显。

So, what does a sane std::basic_string implementation do (or is this even specified in the standard)? 那么,理智的std :: basic_string实现做了什么(或者甚至在标准中指定了什么)?

Common implementations will walk the original string to calculate the length and then allocate that much space. 常见的实现将遍历原始字符串以计算长度,然后分配那么多空间。 It requires walking the string twice, but that is a fast operation, in some cases with hardware support and even when there is no hardware support for the operation, it is probably cheap compare with a single memory allocation. 它需要走两次字符串,但这是一个快速的操作,在某些情况下有硬件支持,即使没有硬件支持操作,它可能比单个内存分配便宜

The first approach is the answer. 第一种方法就是答案。 Per standard §21.4.2: 按照标准§21.4.2:

basic_string(const charT* s, const Allocator& a = Allocator());

9 Effects: Constructs an object of class basic_string and determines its initial string value from the array of charT of length traits::length(s) whose first element is designated by s... 9 效果:构造一个类basic_string的对象,并从长度为traits::length(s)的charT数组中确定其初始字符串值,其第一个元素由s指定...

and

10 Remarks: Uses traits::length() . 10 备注:使用traits::length()

gcc's implementation is: gcc的实现是:

  template<typename _CharT, typename _Traits, typename _Alloc>
    basic_string<_CharT, _Traits, _Alloc>::
    basic_string(const _CharT* __s, const _Alloc& __a)
    : _M_dataplus(_S_construct(__s, __s ? __s + traits_type::length(__s) :
                   __s + npos, __a), __a)
    { }

It uses traits_type::length which is something like std::char_traits::length to discover length of c-style zero terminated strings. 它使用traits_type::length ,类似于std::char_traits::length来发现c样式的零终止字符串的长度。


If you have huge entry string to pass the function and you have it's length, you can use another overload which gets the size and doesn't calculate it again: 如果你有一个巨大的输入字符串来传递函数并且你有它的长度,你可以使用另一个重载来获取大小并且不再计算它:

basic_string(const CharT* s, size_type count, ...)

The second approach that you've mentioned has another disadvantage, it has to shrink the allocate memory to stop wasting memory. 你提到的第二种方法有另一个缺点,它必须缩小分配内存以停止浪费内存。 This operation is expensive also. 这种操作也很昂贵。

I can't think of a sane implementation that would use the second method. 我想不出一个使用第二种方法的理智的实现。 Some implementations (ie Visual C++) do perform default initialization, which may allocate some minimum length (such as 1 or 16), and then call assign , which will get the length of the string, reallocate if necessary, and then copy the string. 某些实现(即Visual C ++)执行默认初始化,可以分配一些最小长度(例如1或16),然后调用assign ,它将获取字符串的长度,必要时重新分配,然后复制字符串。

Many - if not all - modern compilers will use hand-tuned assembly language to get the length of a null-terminated string, which is typically extremely fast. 许多 - 如果不是全部 - 现代编译器将使用手动调整的汇编语言来获取以null结尾的字符串的长度,这通常非常快。 Doing an allocate-copy-reallocate-copy-etc... would be madness, really, at least on all platforms that I know of. 做一个allocate-copy-reallocate-copy-etc ......真是疯了,至少在我所知道的所有平台上都是如此。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM