简体   繁体   中英

std::string in a multi-threaded program

Given that:

1) The C++03 standard does not address the existence of threads in any way

2) The C++03 standard leaves it up to implementations to decide whether std::string should use Copy-on-Write semantics in its copy-constructor

3) Copy-on-Write semantics often lead to unpredictable behavior in a multi-threaded program

I come to the following, seemingly controversial, conclusion:

You simply cannot safely and portably use std::string in a multi-threaded program

Obviously, no STL data structure is thread-safe. But at least, with std::vector for example, you can simply use mutexes to protect access to the vector. With an std::string implementation that uses COW, you can't even reliably do that without editing the reference counting semantics deep within the vendor implementation.

Real-world example:

In my company, we have a multi-threaded application which has been thoroughly unit-tested and run through Valgrind countless times. The application ran for months with no problems whatsoever. One day, I recompile the application on another version of gcc, and all of a sudden I get random segfaults all the time. Valgrind is now reporting invalid memory accesses deep within libstdc++, in the std::string copy constructor.

So what is the solution? Well, of course, I could typedef std::vector<char> as a string class - but really, that sucks. I could also wait for C++0x, which I pray will require implementors to forgo COW. Or, (shudder), I could use a custom string class. I personally always rail against developers who implement their own classes when a preexisting library will do fine, but honestly, I need a string class which I can be sure is not using COW semantics; and std::string simply doesn't guarantee that.

Am I right that std::string simply cannot be used reliably at all in portable, multi-threaded programs? And what is a good workaround?

You cannot safely and portably do anything in a multi-threaded program. There is no such thing as a portable multi-threaded C++ program, precisely because threads throw everything C++ says about order of operations, and the results of modifying any variable, out the window.

There's also nothing in the standard to guarantee that vector can be used in the way you say. It would be legal to provide a C++ implementation with a threading extension in which, say, any use of a vector outside the thread in which it was initialized results in undefined behavior. The instant you start a second thread, you aren't using standard C++ any more, and you must look to your compiler vendor for what is safe and what is not.

If your vendor provides a threading extension, and also provides a std::string with COW that (therefore) cannot be made thread-safe, then I think for the time being your argument is with your vendor, or with the threading extension, not with the C++ standard. For example, arguably POSIX should have barred COW strings in programs which use pthreads.

You could possibly make it safe by having a single mutex, which you take while doing any string mutation whatsoever, and any reads of a string that's the result of a copy. But you'd probably get crippling contention on that mutex.

You are right. This will be fixed in C++0x. For now you have to rely on your implementation's documentation. For example, recent libstdc++ Versions (GCC) lets you use string objects as if no string object shares its buffer with another one. C++0x forces a library implemetation to protect the user from "hidden sharing".

Given that the standard doesn't say a word about memory models and is completely thread unaware, I'd say you can't definitely assume every implementation will be non-cow so no, you can't

Apart from that, if you know your tools, most of the implementations will use non-cow strings to allow multi-threading.

A more correct way to look at it would be "You cannot safely and portably use C++ in a multithreaded environment". There is no guarantee that other data structures will behave sensibly either. Or that the runtime won't blow up your computer. The standard doesn't guarantee anything about threads.

So to do anything with threads in C++, you have to rely on implementation-defined guarantees. And Then you can safely use std::string because each implementation tells you whether or not it is safe to use in a threaded environment.

You lost all hope of true portability the moment you spawned a second thread. std::string isn't "less portable" than the rest of the language/library.

You can use STLport. It provides non-COW strings. And it has the same behavior on different platforms.

This article presents comparison of STL strings with copy-on-write and noncopy- on-write argorithms, based on STLport strings, ropes and GNU libstdc++ implementations.

In a company where I work I have some experience running the same server application built with STLport and without STLport on HP-UX 11.31. The application was compiled with gcc 4.3.1 with optimization level O2. So when I run the progrma built with STLport it processes requests 25% faster comparing to the the same program built without STLport (which uses gcc own STL library).

I profiled both versions and found out that the version without STLport spends much more time in pthread_mutex_unlock() (2.5%) comparing to the version with STLport (1%). And pthread_mutex_unlock() itself in the version without STLport is called from one of std::string functions.

However, when after profiling I changed assignments to strings in most often called functions in this way:

string_var = string_var.c_str(); // added .c_str()

there was significant improvement in performance of the version without STLport.

I regulate the string access:

  • make std::string members private
  • return const std::string& for getters
  • setters modify the member

This has always worked fine for me and is correct data hiding.

If you want to disable COW semantics, you could force your strings to make copies:

// instead of:
string newString = oldString;

// do this:
string newString = oldString.c_str();

As pointed out, especially if you could have embedded nulls, then you should use the iterator ctor:

string newString(oldString.begin(), oldString.end());

In MSVC, std::string is no longer reference counted shared pointer to a container. They choose to the entire contents by-value in every copy constructor and assignment operator, to avoid multithreading problems.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM