简体   繁体   中英

Unexpected behavior involving const_cast

I came up with the following example, which exposes some unexpected behavior. I would expect that after push_back, whatever is in the vector is there. It looks like the compiler somehow decided to re-use memory used by str.

Could someone explain what is happening in this example? Is this valid c++ code?

The original problem arises from code responsible for serializing / deserializing messages and it uses const_cast to remove constness. After noticing some unexpected behavior with that code, I created this simplified example, which tries to demonstrate the issue.

#include <vector>
#include <iostream>
#include <string>
using namespace std;
int main()
{
    auto str = std::string("XYZ"); // mutable string
    const auto& cstr(str);         // const ref to it

    vector<string> v;
    v.push_back(cstr);

    cout << v.front() << endl;  // XYZ is printed as expected

    *const_cast<char*>(&cstr[0])='*'; // this will modify the first element in the VECTOR (is this expected?)
    str[1]='#';  //

    cout << str << endl;  // prints *#Z as expected
    cout << cstr << endl; // prints *#Z as expected
    cout << v.front() << endl; // Why *YZ is printed, not XYZ and not *#Z ?

    return 0;
}

Understanding the bug

The unexpected behavior occurs because of quirks in a depreciated implementation of std::string . Older versions of GCC implemented std::string using copy-on-write semantics. It's a clever idea, but it causes bugs like the one you're seeing. What that means is that GCC tried to define std::string so that the internal string buffer only got copied if the new std::string was modified. For example:

std::string A = "Hello, world";
std::string B = A; // No copy occurs (yet)
A[3] = '*'; // Copy occurs now because A got modified.

When you take a constant pointer, however, no copy occurs because the library assumes that the string will not be modified through that pointer:

std::string A = "Hello, world"; 
std::string B = A;
std::string const& A_ref = A;

const_cast<char&>(A_ref[3]) = '*'; // No copy occurs (your bug)

As you've noticed, copy-on-write semantics tends to cause bugs. Because of this, and because copying a string is pretty cheap (all things considered), the copy copy-on-write implementation of std::string was depreciated and removed in GCC 5.

So why are you seeing this bug if you're using GCC 5? It's likely that you're compiling and linking an older version of the C++ standard library (one where copy-on-write is still the implementation of std::string ). This is what's causing the bug for you.

Check which version of the C++ standard library you're compiling against, and if possible, update your compiler.

How can I tell which implemenation of std::string my compiler is using?

  • New GCC implementation: sizeof(std::string) == 32 (when compiling for 64 bit)
  • Old GCC implementation: sizeof(std::string) == 8 (when compiling for 64 bit)

If your compiler is using the old implementation of std::string , then sizeof(std::string) is the same as sizeof(char*) because std::string is implemented as a pointer to a block of memory. The block of memory is the one that actually contains things like the size and capacity of the string.

struct string { //Old data layout
    size_t* _data; 
    size_t size() const {
        return *(data - SIZE_OFFSET); 
    }
    size_t capacity() const {
        return *(data - CAPACITY_OFFSET); 
    }
    char const* data() const {
        return (char const*)_data; 
    }
};

On the other hand, if you're using the newer implementation of std::string , then sizeof(std::string) should be 32 bytes (on 64 bit systems). This is because the newer implementation stores the size and capacity of the string within the std::string itself, rather than in the data it points to:

struct string { // New data layout
    char* _data;
    size_t _size;
    size_t _capacity; 
    size_t _padding; 
    // ...
}; 

What's good about the new implementation? The new implementation has a number of benefits:

  • Accessing size and capacity can be done more quickly (since the optimizer is more likely to store them in the registers, or at the very least they're likely to be in the cache)
  • Because std::string is 32 bytes, we can take advantage of Small String Optimization. Small String Optimization allows strings less than 16 characters long to be stored within the space normally taken up by _capacity and _padding . This avoids heap allocations, and is faster for most use cases.

We can see below that GDB uses the old implementation of std::string , because sizeof(std::string) returns 8 bytes:

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM