简体   繁体   中英

Defining a string with no null terminating char(\0) at the end

What are various ways in C/C++ to define a string with no null terminating char(\\0) at the end?

EDIT: I am interested in character arrays only and not in STL string.

Typically as another poster wrote:

char s[6] = {'s', 't', 'r', 'i', 'n', 'g'};

or if your current C charset is ASCII, which is usually true (not much EBCDIC around today)

char s[6] = {115, 116, 114, 105, 110, 107};

There is also a largely ignored way that works only in C (not C++)

char s[6] = "string";

If the array size is too small to hold the final 0 (but large enough to hold all the other characters of the constant string), the final zero won't be copied, but it's still valid C (but invalid C++).

Obviously you can also do it at run time:

char s[6];
s[0] = 's';
s[1] = 't';
s[2] = 'r';
s[3] = 'i';
s[4] = 'n';
s[5] = 'g';

or (same remark on ASCII charset as above)

char s[6];
s[0] = 115;
s[1] = 116;
s[2] = 114;
s[3] = 105;
s[4] = 110;
s[5] = 103;

Or using memcopy (or memmove, or bcopy but in this case there is no benefit to do that).

memcpy(c, "string", 6);

or strncpy

strncpy(c, "string", 6);

What should be understood is that there is no such thing as a string in C (in C++ there is strings objects, but that's completely another story). So called strings are just char arrays. And even the name char is misleading, it is no char but just a kind of numerical type. We could probably have called it byte instead, but in the old times there was strange hardware around using 9 bits registers or such and byte implies 8 bits.

As char will very often be used to store a character code, C designers thought of a simpler way than store a number in a char. You could put a letter between simple quotes and the compiler would understand it must store this character code in the char.

What I mean is (for example) that you don't have to do

char c = '\0';

To store a code 0 in a char, just do:

char c = 0;

As we very often have to work with a bunch of chars of variable length, C designers also choosed a convention for "strings". Just put a code 0 where the text should end. By the way there is a name for this kind of string representation "zero terminated string" and if you see the two letters sz at the beginning of a variable name it usually means that it's content is a zero terminated string.

"C sz strings" is not a type at all, just an array of chars as normal as, say, an array of int, but string manipulation functions (strcmp, strcpy, strcat, printf, and many many others) understand and use the 0 ending convention. That also means that if you have a char array that is not zero terminated, you shouldn't call any of these functions as it will likely do something wrong (or you must be extra carefull and use functions with a n letter in their name like strncpy ).

The biggest problem with this convention is that there is many cases where it's inefficient. One typical exemple: you want to put something at the end of a 0 terminated string. If you had kept the size you could just jump at the end of string, with sz convention, you have to check it char by char. Other kind of problems occur when dealing with encoded unicode or such. But at the time C was created this convention was very simple and did perfectly the job.

Nowadays, the letters between double quotes like "string" are not plain char arrays as in the past, but const char * . That means that what the pointer points to is a constant that should not be modified (if you want to modify it you must first copy it), and that is a good thing because it helps to detect many programming errors at compile time.

C++ std::string s are not NUL terminated.

PS : NULL is a macro 1 . NUL is \\0 . Don't mix them up.

1: C.2.2.3 Macro NULL

The macro NULL, defined in any of <clocale> , <cstddef> , <cstdio> , <cstdlib> , <cstring> , <ctime> , or <cwchar> , is an implementation-defined C++ null pointer constant in this International Standard (18.1).

The terminating null is there to terminate the string. Without it, you need some other method to determine it's length.

You can use a predefined length:

char s[6] = {'s','t','r','i','n','g'};

You can emulate pascal-style strings:

unsigned char s[7] = {6, 's','t','r','i','n','g'};

You can use std::string (in C++). (since you're not interested in std::string).

Preferably you would use some pre-existing technology that handles unicode, or at least understands string encoding (ie, wchar.h ).

And a comment: If you're putting this in a program intended to run on an actual computer, you might consider typedef-ing your own "string". This will encourage your compiler to barf if you ever accidentally try to pass it to a function expecting a C-style string.

typedef struct {
    char[10] characters;
} ThisIsNotACString;

在C ++中,您可以使用字符串类,而不是处理null char。

只是为了完整性并完全解决这个问题。

vector<char>

Use std::string.

There are dozens of other ways to store strings, but using a library is often better than making your own. I'm sure we could all come up with plenty of wacky ways of doing strings without null terminators :).

In C there generally won't be an easier solution. You could possibly do what pascal did and put the length of the string in the first character, but this is a bit of a pain and will limit your string length to the size of the integer that can fit in the space of the first char. In C++ I'd definitely use the std::string class that can be accessed by

#include <string>

Being a commonly used library this will almost certainly be more reliable than rolling your own string class.

The reason for the NULL termination is so that the handler of the string can determine it's length. If you don't use a NULL termination, you need to pass the strings length, either through a separate parameter/variable, or as part of the string. Otherwise, you could use another delimeter, so long as it isn't used within the string itself.

To be honest, I don't quite understand your question, or if it actually is a question.

Even the string class will store it with a null. If for some reason you absolutely do not want a null character at the end of your string in memory, you'd have to manually create a block of characters, and fill it out yourself.

I can't personally think of any realistic scenario for why you'd want to do this, since the null character is what signals the end of the string. If you're storing the length of the string too, then I guess you've saved one byte at the cost of whatever the size of your variable is (likely 4 bytes), and gained faster access to the length of said string.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM