简体   繁体   中英

Direct assignment vs Dynamic allocation for char * string

I have being using C++ for quite some time now, but there is one fundamental concept that I could not get it. First I will list two ways of assigning a text string to a char *.

Method 1:

char * str = "Hello World";

Method 2:

char * str = new char [12];
strcpy(str,"Hello World");

I am very familiar with method 2. Method 1 is the one that gives me headache. My questions are

  1. What's the fundamental difference between these two methods? Any advantages/disadvantages?
  2. Should I clean the memory manually for method 1?
  3. What is the life span for the string in method 1? Can I trust it to persist as long as the pointer is still valid? Can I change the content (provided I do not run over the '\\0' at the end)?

I have read countless C++ textbooks and articles. They all tell me method 1 works with no elaboration on the repercussions. My own experimentation does not yield convincing results.

Thank you (and maybe excuse me for my bad English)

Edit : Actually I am programming using WinAPI with tchar string in VS2015, and method 1 compiles perfectly. The std::string is terrible with Unicode handling.

Imagine you have two projects in a solution, one with Unicode other with multi-byte, and these two projects use the same library. Inside this library, it is good to use tchar. std string you must explicitly tell which version it is.

I have to do this because the multi-byte project is a dll I need to inject into another app. The unicode version of the DLL will crash the app, only the multi-byte works.

char * str = "Hello World";

Is deprecated in C++ as it violates const correctness. "Hello World" is a const char[] and pointing to it with a char* is an invitation to undefined behavior as you could try to modify it. If you want to work with strings in C++ I suggest you use std::string which prevents you from falling into the numerous pitfalls c-strings have.

If you do need a c-string then you can use

char str[] = "Hello World";

Which will create a char array of the correct length and allow you to modify the contents.

Edit: Actually I am programming using WinAPI with tchar string in VS2015, and method 1 compiles perfectly. The std::string is terrible with Unicode handling.

There is nothing stopping method one from compiling on the majority of compilers but if you want to be standard conforming then you need stop using it. It is deprecated and eventually(hopefully) compiler support for it will be removed.

If you need Unicode support then use a std::wstring which wraps a wchar_t*

My answer concerns c++. Some details are different from c.

  1. What's the fundamental difference between these two methods? Any advantages/disadvantages?

Let's take a look at your first code:

char * str = "Hello World";

This is ill formed. You may not assign a string literal to a non-const pointer. At least not since c++11. Prior to that the conversion was merely deprecated.

This:

const char* str = "Hello World";

Would be correct. But, if you need to modify the string, then this is not an option.

Edit: Actually I am programming using WinAPI with tchar string in VS2015, and method 1 compiles perfectly.

Even if your compiler supports the conversion, doing so is quite dangerous because you may accidentally end up modifying the string literal, which is bad because

Let's see your second code:

 
 
 
  
  char * str = new char [12]; strcpy(str,"Hellow World");
 
  

This invokes undefined behaviour. The string literal is 13 characters long (because of the null terminal character) and the strcpy overflows the allocated array.

Edit: The code in question is fixed now, but this demonstrates well, why manually specifying the size is error prone.

I recommend a simpler approach:

char str[] = "Hello World";

This is more concise, and leaves no possibility of using a wrong sized array. It is also more efficient than dynamic allocation, but not as efficient as using the string literal directly. However, unlike a string literal, you may modify this array.

If the array is local, then it is destroyed at the end of scope. Also, you can not resize the array. If you need a resizeable string, then you do need dynamic allocation. I recommend std::string if you need dynamic allocation:

std::string str("Hello World");
  1. Should I clean the memory manually for method 1?

No, you should not. String literals have static storage.

  1. What is the life span for the string in method 1? Can I trust it to persist as long as the pointer is still valid?

You can trust that the string literal exits throughout the entire execution of your program.

Can I change the content (provided I do not run over the '\\0' at the end)?

Modifying a string literal would have undefined behaviour. You don't want undefined behaviour anywhere near your program.

The std::string is terrible with Unicode handling.

std::string has exactly the same unicode handling as plain character arrays do.

Imagine you have two projects in a solution, one with Unicode other with multi-byte, and these two projects use the same library. Inside this library, it is good to use tchar. std string you must explicitly tell which version it is.

I would avoid using tchar at all except when dealing with windows API. But if you do use it and need the niceties of std::string , then you can simply use std::basic_string<tchar> .

In this declaration

char * str = "Hello World";

that is valid for C and is not valid for C++ there are created two objects.

First of all the compiler creates a zero-terminated character array with the static storage duration for string literal "Hello World" .

Ib C string literals have types of non-constant character arrays while in C++ string literals have types of constant character arrays.

Nevertheless neither in C nor in C++ you may modify a string literal. Any attempt to modify a string literal results in undefined behavior.

That means also that you may not clear the memory occupied by a string literal. It is the compiler that reserves the memory for a string literal.

In C the string literal used in the declaration has type char[12] while in C++ it has type const char[12] .

Thus in C++ the declaration will look like

const char * str = "Hello World";

The second object that is created in the declaration is the pointer named str that points to the first character of the string literal. The pointer itself can be changed that is it can be reassigned.

If the pointer is declared in a code block then it has the automatic storage duration. The storage duration of the pointer does not influence on the storage duration of the string literal that has as mentioned above the static storage duration.

In case of the first approach

 char * str = "Hello World";

you are storing the address of the string literal into the given pointer. However, due to the mismatch of the type const char[] vs char * , this construct is illegal.

Remember, the content of the memory address should not be modified, attempt to do so will invokes undefined behavior . Also, you don't need to free anything, as you did not allocate any dynamic memory.

In the second approach,

char * str = new char [12];
strcpy(str,"Hellow World");

you are allocating dynamic memory to the pointer and filling that with the content of the string literal . This array is perfectly writable. However, point to notice, for a dimension of 12 , you don't have a space for null-terminator. You may want to make the size at least 13 to have room for the null-terminator. Finally, you need to free-up the allocated memory after the usage.

What is the life span for the string in method 1?

For the string literal itself, the lifetime of the program; storage for the literal is allocated when the program starts (maybe even as soon as the program is loaded into memory) and released when the program exits.

Can I trust it to persist as long as the pointer is still valid?

You can trust the literal to persist regardless of the lifetime of the pointer variable str .

Can I change the content (provided I do not run over the '\\0' at the end)?

No. C++ string literals are arrays of const char , meaning they cannot be modified (which would defeat the entire purpose of them being a literal ; it's logically the same thing as changing the content of 42 ).

What's the fundamental difference between these two methods?

The first method does not set aside any new memory, and the contents of what str points to may not be modified.

The second method dynamically allocates a new block of memory and copies the contents of the string literal to it; you may modify the contents of the allocated block to your heart's content.

Any advantages/disadvantages?

Use the first method to create symbolic constants for string literals (which you want to do - I've been burned by misspelling literals more than once).

There aren't many good use cases for the second method; if you need to manipulate text data, use the std::string type instead of arrays of char . C-style string handling is a massive pain in the ass , and the std::string type makes life much easier in that respect. If you need to create and store multiple strings, use a standard container like std::vector .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM