简体   繁体   中英

Why is std::vector<char> faster than std::string?

I have written a small test where I'm trying to compare the run speed of resizing a container and then subsequently using std::generate_n to fill it up. I'm comparing std::string and std::vector<char> . Here is the program:

#include <algorithm>
#include <iostream>
#include <iterator>
#include <random>
#include <vector>

int main()
{
    std::random_device rd;
    std::default_random_engine rde(rd());
    std::uniform_int_distribution<int> uid(0, 25);

    #define N 100000

#ifdef STRING
    std::cout << "String.\n";
    std::string s;
    s.resize(N);
    std::generate_n(s.begin(), N, 
                    [&]() { return (char)(uid(rde) + 65); });
#endif

#ifdef VECTOR
    std::cout << "Vector.\n";
    std::vector<char> v;
    v.resize(N);
    std::generate_n(v.begin(), N, 
                    [&]() { return (char)(uid(rde) + 65); });
#endif

    return 0;
}

And my Makefile :

test_string:
    g++ -std=c++11 -O3 -Wall -Wextra -pedantic -pthread -o test test.cpp -DSTRING
    valgrind --tool=callgrind --log-file="test_output" ./test
    cat test_output | grep "refs"

test_vector:
    g++ -std=c++11 -O3 -Wall -Wextra -pedantic -pthread -o test test.cpp -DVECTOR
    valgrind --tool=callgrind --log-file="test_output" ./test
    cat test_output | grep "refs"

And the comparisons for certain values of N :

N=10000
String: 1,865,367
Vector: 1,860,906

N=100000
String: 5,295,213
Vector: 5,290,757

N=1000000
String: 39,593,564
Vector: 39,589,108

std::vector<char> comes out ahead everytime. Since it seems to be more performant, what is even the point of using std::string ?

I used #define N 100000000 . Tested 3 times for each scenario and in all scenarios string is faster. Not using Valgrind, it does not make sense.

OS: Ubuntu 14.04. Arch:x86_64 CPU: Intel(R) Core(TM) i5-4670 CPU @ 3.40GHz.

$COMPILER -std=c++11 -O3 -Wall -Wextra -pedantic -pthread -o test x.cc -DVECTOR    
$COMPILER -std=c++11 -O3 -Wall -Wextra -pedantic -pthread -o test x.cc -DSTRING

Times:

compiler/variant           | time(1) | time(2) | time(3)
---------------------------+---------+---------+--------
g++ 4.8.2/vector    Times: | 1.724s  | 1.704s  | 1.669s
g++ 4.8.2/string    Times: | 1.675s  | 1.678s  | 1.674s
clang++ 3.5/vector  Times: | 1.929s  | 1.934s  | 1.905s
clang++ 3.5/string  Times: | 1.616s  | 1.612s  | 1.619s

std::vector comes out ahead everytime. Since it seems to be more performant, what is even the point of using std::string?

Even if we suppose that your observation holds true for a wide range of different systems and different application contexts, it would still make sense to use std::string for various reasons, which are all rooted in the fact that a string has different semantics than a vector. A string is a piece of text (at least simple, non-internationalised English text), a vector is a collection of characters.

Two things come to mind:

  • Ease of use. std::string can be constructed from string literals, has a lot of convenient operators and can be subject to string-specific algorithms. Try std::string x = "foo" + ("bar" + boost::algorithm::replace_all_copy(f(), "abc", "ABC").substr(0, 10) with a std::vector<char> ...

  • std::string is implemented with Small-String Optimization (SSO) in MSVC, eliminating heap allocation entirely in many cases. SSO is based on the observation that strings are often very short, which certainly cannot be said about vectors.

Try the following:

#include <iostream>
#include <vector>
#include <string>

int main()
{
    char const array[] = "short string";

#ifdef STRING
    std::cout << "String.\n";
    for (int i = 0; i < 10000000; ++i) {
        std::string s = array;
    }
#endif

#ifdef VECTOR
    std::cout << "Vector.\n";
    for (int i = 0; i < 10000000; ++i) {
        std::vector<char> v(std::begin(array), std::end(array));
    }
#endif
}

The std::string version should outperform the std::vector version, at least with MSVC. The difference is about 2-3 seconds on my machine. For longer strings, the results should be different.

Of course, this does not really prove anything either, except two things:

  • Performance tests depend a lot on the environment.
  • Performance tests should test what will realistically be done in a real program. In the case of strings, your program may deal with many small strings rather than a single huge one, so test small strings.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM