简体   繁体   中英

No small string optimization with gcc?

Most std::string implementations (GCC included) use small string optimization. Eg there's an answer discussing this.

Today, I decided to check at what point a string in a code I compile gets moved to the heap. To my surprise, my test code seems to show that no small string optimization occurs at all!

Code:

#include <iostream>
#include <string>

using std::cout;
using std::endl;

int main(int argc, char* argv[]) {
  std::string s;

  cout << "capacity: " << s.capacity() << endl;

  cout << (void*)s.c_str() << " | " << s << endl;
  for (int i=0; i<33; ++i) {
    s += 'a';
    cout << (void*)s.c_str() << " | " << s << endl;
  }

}

The output of g++ test.cc && ./a.out is

capacity: 0
0x7fe405f6afb8 | 
0x7b0c38 | a
0x7b0c68 | aa
0x7b0c38 | aaa
0x7b0c38 | aaaa
0x7b0c68 | aaaaa
0x7b0c68 | aaaaaa
0x7b0c68 | aaaaaaa
0x7b0c68 | aaaaaaaa
0x7b0c98 | aaaaaaaaa
0x7b0c98 | aaaaaaaaaa
0x7b0c98 | aaaaaaaaaaa
0x7b0c98 | aaaaaaaaaaaa
0x7b0c98 | aaaaaaaaaaaaa
0x7b0c98 | aaaaaaaaaaaaaa
0x7b0c98 | aaaaaaaaaaaaaaa
0x7b0c98 | aaaaaaaaaaaaaaaa
0x7b0cd8 | aaaaaaaaaaaaaaaaa
0x7b0cd8 | aaaaaaaaaaaaaaaaaa
0x7b0cd8 | aaaaaaaaaaaaaaaaaaa
0x7b0cd8 | aaaaaaaaaaaaaaaaaaaa
0x7b0cd8 | aaaaaaaaaaaaaaaaaaaaa
0x7b0cd8 | aaaaaaaaaaaaaaaaaaaaaa
0x7b0cd8 | aaaaaaaaaaaaaaaaaaaaaaa
0x7b0cd8 | aaaaaaaaaaaaaaaaaaaaaaaa
0x7b0cd8 | aaaaaaaaaaaaaaaaaaaaaaaaa
0x7b0cd8 | aaaaaaaaaaaaaaaaaaaaaaaaaa
0x7b0cd8 | aaaaaaaaaaaaaaaaaaaaaaaaaaa
0x7b0cd8 | aaaaaaaaaaaaaaaaaaaaaaaaaaaa
0x7b0cd8 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaa
0x7b0cd8 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
0x7b0cd8 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
0x7b0cd8 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
0x7b0d28 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

I'm guessing that the larger first pointer, ie 0x7fe405f6afb8 is a stack pointer, and the other ones point to the heap. Running this many times produces identical results, in the sense that the first address is always large, and the other ones are smaller; the exact values usually differ. The smaller addresses always follow the standard power of 2 allocation scheme, eg 0x7b0c38 is listed once, then 0x7b0c68 is listed once, then 0x7b0c38 twice, then 0x7b0c68 4 times, then 0x7b0c98 8 times, etc.

After reading Howard's answer, using a 64bit machine, I was expecting to see the same address printed for the first 22 characters, and only then to see it change.

Am I missing something?

Also, interestingly, if I compile with -O (at any level), I get a constant small pointer value 0x6021f8 in the first case, instead of the large value, and this 0x6021f8 doesn't change regardless of how many times I run the program.

Output of g++ -v :

Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/foo/bar/gcc-6.2.0/gcc/libexec/gcc/x86_64-redhat-linux/6.2.0/lto-wrapper
Target: x86_64-redhat-linux
Configured with: ../gcc-6.2.0/configure --prefix=/foo/bar/gcc-6.2.0/gcc --build=x86_64-redhat-linux --disable-multilib --enable-languages=c,c++,fortran --with-default-libstdcxx-abi=gcc4-compatible --enable-bootstrap --enable-threads=posix --with-long-double-128 --enable-long-long --enable-lto --enable-__cxa_atexit --enable-gnu-unique-object --with-system-zlib --enable-gold
Thread model: posix
gcc version 6.2.0 (GCC)

One of your flags is:

--with-default-libstdcxx-abi=gcc4-compatible

and GCC4 does not support small string optimzation.


GCC5 started supporting it. isocpp states:

A new implementation of std::string is enabled by default, using the small string optimization instead of copy-on-write reference counting.

which supports my claim.

Moreover, Exploring std::string mentions:

As we see, older libstdc++ implements copy-on-write, and so it makes sense for them to not utilize small objects optimization.

and then he changes context, when GCC5 comes in play.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM