简体   繁体   English

涉及 const_cast 的意外行为

[英]Unexpected behavior involving const_cast

I came up with the following example, which exposes some unexpected behavior.我想出了下面的例子,它暴露了一些意想不到的行为。 I would expect that after push_back, whatever is in the vector is there.我希望在 push_back 之后,向量中的任何内容都在那里。 It looks like the compiler somehow decided to re-use memory used by str.看起来编译器以某种方式决定重用 str 使用的内存。

Could someone explain what is happening in this example?有人能解释一下这个例子中发生了什么吗? Is this valid c++ code?这是有效的 C++ 代码吗?

The original problem arises from code responsible for serializing / deserializing messages and it uses const_cast to remove constness.最初的问题来自负责序列化/反序列化消息的代码,它使用 const_cast 来删除常量。 After noticing some unexpected behavior with that code, I created this simplified example, which tries to demonstrate the issue.在注意到该代码的一些意外行为后,我创建了这个简化的示例,它试图演示该问题。

#include <vector>
#include <iostream>
#include <string>
using namespace std;
int main()
{
    auto str = std::string("XYZ"); // mutable string
    const auto& cstr(str);         // const ref to it

    vector<string> v;
    v.push_back(cstr);

    cout << v.front() << endl;  // XYZ is printed as expected

    *const_cast<char*>(&cstr[0])='*'; // this will modify the first element in the VECTOR (is this expected?)
    str[1]='#';  //

    cout << str << endl;  // prints *#Z as expected
    cout << cstr << endl; // prints *#Z as expected
    cout << v.front() << endl; // Why *YZ is printed, not XYZ and not *#Z ?

    return 0;
}

Understanding the bug理解错误

The unexpected behavior occurs because of quirks in a depreciated implementation of std::string .意外行为的发生是由于std::string的折旧实现中的怪癖。 Older versions of GCC implemented std::string using copy-on-write semantics.旧版本的 GCC 使用写时复制语义实现了std::string It's a clever idea, but it causes bugs like the one you're seeing.这是一个聪明的主意,但它会导致像您看到的那样的错误。 What that means is that GCC tried to define std::string so that the internal string buffer only got copied if the new std::string was modified.这意味着 GCC 试图定义std::string以便只有在修改新的std::string时才会复制内部字符串缓冲区。 For example:例如:

std::string A = "Hello, world";
std::string B = A; // No copy occurs (yet)
A[3] = '*'; // Copy occurs now because A got modified.

When you take a constant pointer, however, no copy occurs because the library assumes that the string will not be modified through that pointer:但是,当您使用常量指针时,不会发生复制,因为库假定不会通过该指针修改字符串:

std::string A = "Hello, world"; 
std::string B = A;
std::string const& A_ref = A;

const_cast<char&>(A_ref[3]) = '*'; // No copy occurs (your bug)

As you've noticed, copy-on-write semantics tends to cause bugs.正如您所注意到的,写时复制语义往往会导致错误。 Because of this, and because copying a string is pretty cheap (all things considered), the copy copy-on-write implementation of std::string was depreciated and removed in GCC 5.正因为如此,并且因为复制字符串非常便宜(考虑到所有因素), std::string copy-on-write实现在 GCC 5 中被贬值和删除

So why are you seeing this bug if you're using GCC 5?那么,如果您使用 GCC 5,为什么会看到此错误? It's likely that you're compiling and linking an older version of the C++ standard library (one where copy-on-write is still the implementation of std::string ).您可能正在编译和链接旧版本的 C++ 标准库(写时复制仍然是std::string的实现)。 This is what's causing the bug for you.这就是导致您出现错误的原因。

Check which version of the C++ standard library you're compiling against, and if possible, update your compiler.检查您正在编译的 C++ 标准库版本,如果可能,更新您的编译器。

How can I tell which implemenation of std::string my compiler is using?我如何知道我的编译器正在使用std::string哪种实现?

  • New GCC implementation: sizeof(std::string) == 32 (when compiling for 64 bit)新的 GCC 实现: sizeof(std::string) == 32 (编译 64 位时)
  • Old GCC implementation: sizeof(std::string) == 8 (when compiling for 64 bit)旧的 GCC 实现: sizeof(std::string) == 8 (编译为 64 位时)

If your compiler is using the old implementation of std::string , then sizeof(std::string) is the same as sizeof(char*) because std::string is implemented as a pointer to a block of memory.如果您的编译器使用std::string的旧实现,则sizeof(std::string)sizeof(char*)相同,因为std::string实现为指向内存块的指针。 The block of memory is the one that actually contains things like the size and capacity of the string.内存块是实际包含字符串大小和容量等内容的内存块。

struct string { //Old data layout
    size_t* _data; 
    size_t size() const {
        return *(data - SIZE_OFFSET); 
    }
    size_t capacity() const {
        return *(data - CAPACITY_OFFSET); 
    }
    char const* data() const {
        return (char const*)_data; 
    }
};

On the other hand, if you're using the newer implementation of std::string , then sizeof(std::string) should be 32 bytes (on 64 bit systems).另一方面,如果您使用的是std::string的较新实现,则sizeof(std::string)应该是 32 字节(在 64 位系统上)。 This is because the newer implementation stores the size and capacity of the string within the std::string itself, rather than in the data it points to:这是因为较新的实现将字符串的大小和容量存储在std::string本身中,而不是它指向的数据中:

struct string { // New data layout
    char* _data;
    size_t _size;
    size_t _capacity; 
    size_t _padding; 
    // ...
}; 

What's good about the new implementation?新的实施有什么好处? The new implementation has a number of benefits:新的实现有很多好处:

  • Accessing size and capacity can be done more quickly (since the optimizer is more likely to store them in the registers, or at the very least they're likely to be in the cache)可以更快地访问大小和容量(因为优化器更有可能将它们存储在寄存器中,或者至少它们很可能在缓存中)
  • Because std::string is 32 bytes, we can take advantage of Small String Optimization.因为std::string是 32 字节,我们可以利用小字符串优化。 Small String Optimization allows strings less than 16 characters long to be stored within the space normally taken up by _capacity and _padding . Small String Optimization 允许长度小于 16 个字符的字符串存储在_capacity_padding通常占用的空间内。 This avoids heap allocations, and is faster for most use cases.这避免了堆分配,并且在大多数用例中速度更快。

We can see below that GDB uses the old implementation of std::string , because sizeof(std::string) returns 8 bytes:我们可以在下面看到 GDB 使用std::string的旧实现,因为sizeof(std::string)返回 8 个字节:

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM