[英]Unexpected behavior involving const_cast
I came up with the following example, which exposes some unexpected behavior.我想出了下面的例子,它暴露了一些意想不到的行为。 I would expect that after push_back, whatever is in the vector is there.
我希望在 push_back 之后,向量中的任何内容都在那里。 It looks like the compiler somehow decided to re-use memory used by str.
看起来编译器以某种方式决定重用 str 使用的内存。
Could someone explain what is happening in this example?有人能解释一下这个例子中发生了什么吗? Is this valid c++ code?
这是有效的 C++ 代码吗?
The original problem arises from code responsible for serializing / deserializing messages and it uses const_cast to remove constness.最初的问题来自负责序列化/反序列化消息的代码,它使用 const_cast 来删除常量。 After noticing some unexpected behavior with that code, I created this simplified example, which tries to demonstrate the issue.
在注意到该代码的一些意外行为后,我创建了这个简化的示例,它试图演示该问题。
#include <vector>
#include <iostream>
#include <string>
using namespace std;
int main()
{
auto str = std::string("XYZ"); // mutable string
const auto& cstr(str); // const ref to it
vector<string> v;
v.push_back(cstr);
cout << v.front() << endl; // XYZ is printed as expected
*const_cast<char*>(&cstr[0])='*'; // this will modify the first element in the VECTOR (is this expected?)
str[1]='#'; //
cout << str << endl; // prints *#Z as expected
cout << cstr << endl; // prints *#Z as expected
cout << v.front() << endl; // Why *YZ is printed, not XYZ and not *#Z ?
return 0;
}
The unexpected behavior occurs because of quirks in a depreciated implementation of std::string
.意外行为的发生是由于
std::string
的折旧实现中的怪癖。 Older versions of GCC implemented std::string
using copy-on-write semantics.旧版本的 GCC 使用写时复制语义实现了
std::string
。 It's a clever idea, but it causes bugs like the one you're seeing.这是一个聪明的主意,但它会导致像您看到的那样的错误。 What that means is that GCC tried to define
std::string
so that the internal string buffer only got copied if the new std::string
was modified.这意味着 GCC 试图定义
std::string
以便只有在修改新的std::string
时才会复制内部字符串缓冲区。 For example:例如:
std::string A = "Hello, world";
std::string B = A; // No copy occurs (yet)
A[3] = '*'; // Copy occurs now because A got modified.
When you take a constant pointer, however, no copy occurs because the library assumes that the string will not be modified through that pointer:但是,当您使用常量指针时,不会发生复制,因为库假定不会通过该指针修改字符串:
std::string A = "Hello, world";
std::string B = A;
std::string const& A_ref = A;
const_cast<char&>(A_ref[3]) = '*'; // No copy occurs (your bug)
As you've noticed, copy-on-write semantics tends to cause bugs.正如您所注意到的,写时复制语义往往会导致错误。 Because of this, and because copying a string is pretty cheap (all things considered), the copy copy-on-write implementation of
std::string
was depreciated and removed in GCC 5.正因为如此,并且因为复制字符串非常便宜(考虑到所有因素),
std::string
copy-on-write实现在 GCC 5 中被贬值和删除。
So why are you seeing this bug if you're using GCC 5?那么,如果您使用 GCC 5,为什么会看到此错误? It's likely that you're compiling and linking an older version of the C++ standard library (one where copy-on-write is still the implementation of
std::string
).您可能正在编译和链接旧版本的 C++ 标准库(写时复制仍然是
std::string
的实现)。 This is what's causing the bug for you.这就是导致您出现错误的原因。
Check which version of the C++ standard library you're compiling against, and if possible, update your compiler.检查您正在编译的 C++ 标准库版本,如果可能,更新您的编译器。
std::string
my compiler is using?std::string
哪种实现?sizeof(std::string) == 32
(when compiling for 64 bit)sizeof(std::string) == 32
(编译 64 位时)sizeof(std::string) == 8
(when compiling for 64 bit)sizeof(std::string) == 8
(编译为 64 位时) If your compiler is using the old implementation of std::string
, then sizeof(std::string)
is the same as sizeof(char*)
because std::string
is implemented as a pointer to a block of memory.如果您的编译器使用
std::string
的旧实现,则sizeof(std::string)
与sizeof(char*)
相同,因为std::string
实现为指向内存块的指针。 The block of memory is the one that actually contains things like the size and capacity of the string.内存块是实际包含字符串大小和容量等内容的内存块。
struct string { //Old data layout
size_t* _data;
size_t size() const {
return *(data - SIZE_OFFSET);
}
size_t capacity() const {
return *(data - CAPACITY_OFFSET);
}
char const* data() const {
return (char const*)_data;
}
};
On the other hand, if you're using the newer implementation of std::string
, then sizeof(std::string)
should be 32 bytes (on 64 bit systems).另一方面,如果您使用的是
std::string
的较新实现,则sizeof(std::string)
应该是 32 字节(在 64 位系统上)。 This is because the newer implementation stores the size and capacity of the string within the std::string
itself, rather than in the data it points to:这是因为较新的实现将字符串的大小和容量存储在
std::string
本身中,而不是它指向的数据中:
struct string { // New data layout
char* _data;
size_t _size;
size_t _capacity;
size_t _padding;
// ...
};
What's good about the new implementation?新的实施有什么好处? The new implementation has a number of benefits:
新的实现有很多好处:
std::string
is 32 bytes, we can take advantage of Small String Optimization.std::string
是 32 字节,我们可以利用小字符串优化。 Small String Optimization allows strings less than 16 characters long to be stored within the space normally taken up by _capacity
and _padding
. _capacity
和_padding
通常占用的空间内。 This avoids heap allocations, and is faster for most use cases. We can see below that GDB uses the old implementation of std::string
, because sizeof(std::string)
returns 8 bytes:我们可以在下面看到 GDB 使用
std::string
的旧实现,因为sizeof(std::string)
返回 8 个字节:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.