简体   繁体   English

0x00和char数组

[英]0x00 and char arrays

Why do char arrays stop right before a 0x00 byte is detected and how can this problem be avoided (perhaps by using another datatype (which one and why) or a "trick" with char)? 为什么char数组在检测到0x00字节之前就停止,并且如何避免此问题(也许通过使用另一种数据类型(一种和为什么)或使用char的“技巧”)来解决?

For example in the following code, the output is "a" only, the other bytes are not displayed: 例如,在以下代码中,输出仅为“ a”,不显示其他字节:

unsigned char cbuffer[]={0x61,0x00,0x62,0x63,0x0};
std::string sbuffer=reinterpret_cast<const char*>(cbuffer);

cout << sbuffer << endl;

Similarly in the following code, the output is "ab": 类似地,在以下代码中,输出为“ ab”:

unsigned char cbuffer[]={0x61,0x62,0x00,0x63,0x0};
std::string sbuffer=reinterpret_cast<const char*>(cbuffer);

Straightforward and effective workarounds to the problem (where 0x00 is kept in the array as a normal byte) would be appreciated. 简单而有效的解决方法(将0x00作为常规字节保存在数组中)将受到赞赏。

It's common in C to pass around strings as pointers to null-terminated char arrays. 在C语言中,通常将字符串作为指针传递给以null结尾的char数组。 null is represented by 0x00 . null由0x00表示。 To make conversion easy, the std::string is constructable from a pointer to a null-terminated char array, which is what is happening with your code. 为了使转换变得容易, std::string可以从指针构造为以null终止的char数组,这就是代码所发生的情况。 But when it finds the null, it thinks that's the end of the string. 但是,当找到null时,它认为这是字符串的结尾。 If you cout a char array directly, you'll find it makes the same assumption, because they have no other way to determine the end of a string pointed to by a char* . 如果直接cout一个char数组,您会发现它做出了相同的假设,因为它们没有其他方法可以确定char*指向的字符串的结尾。 (They could theoretically tell the length in your case, if they understood char (&)[] , but almost nothing in the standard library does sadly). (如果他们理解char (&)[] ,则理论上可以告诉您情况的长度,但标准库中几乎没有任何东西可悲)。

The intended workarounds are to use this constructor instead: 预期的解决方法是改用此构造函数:

int len = sizeof(cbuffer)/sizeof(cbuffer[0]);
std::string sbuffer(cbuffer, len); //5 characters in cbuffer, 1 byte each

or 要么

int len = sizeof(cbuffer)/sizeof(cbuffer[0]);
std::cout.write(cbuffer, len); //5 characters in buffer, 1 byte each

However, you have to be careful with sizeof(cbuffer) . 但是,您必须小心使用sizeof(cbuffer) If cbuffer is a char* (pointer) instead of a char(&)[] (array), then sizeof(ptr) will return the wrong value, and there is no way to get the correct length at that point, if the string is not null-terminated. 如果cbufferchar* (指针)而不是char(&)[] (数组),则sizeof(ptr)将返回错误的值,并且如果该字符串是字符串,则无法获得正确的长度不是以Null结尾的。

char arrays don't do anything char数组什么都不做

The C string functions use 0 to mark the end of a string. C字符串函数使用0来标记字符串的结尾。
std::cout is overloaded for char arrays to print them as 'c' strings, if you want to print individual values you need to loop over the values, you might also want to output them as std::hex char数组的std :: cout重载以将其打印为'c'字符串,如果要打印单个值,则需要遍历这些值,则可能还需要将它们输出为std :: hex

In this case you are creating a std::String from a 'c' char array so the ctor of the std::string assumes that 'c' strings end at '0'. 在这种情况下,您将通过'c'char数组创建一个std :: String,因此std :: string的ctor假定'c'字符串以'0'结尾。 Since it's only passed an address in memory how else can it know where the string ends? 由于仅在内存中传递了一个地址,如何才能知道字符串的结尾?

ps. ps。 If you want to store an array of bytes you should probably be using std::vector 如果要存储字节数组,则可能应该使用std :: vector

Try: 尝试:

#include <iostream>
#include <string>

int main()
{

    unsigned char cbuffer[]={0x61,0x62,0x00,0x63,0x0};

    // Here s1 is treating the cBuffer as a C-String
    // Thus it will only read upto the first '\0' character
    std::string s1(reinterpret_cast<const char*>(cbuffer));
    std::cout << s1 << "\n";

    // Here s2 is treating the cBuffer as an array.
    // It reads the specified length into the string.
    std::string s2(reinterpret_cast<const char*>(cbuffer), sizeof(cbuffer)/sizeof(cbuffer[0]));

    // Note: How std::cout prints the '\0' character may leave it empty. 
    std::cout << s2 << "\n";

}

The 0x00 byte is used as a sentinel to mark the end of the string in C. The entire array however remains in memory. 0x00字节用作标记,以标记C中字符串的结尾。但是,整个数组保留在内存中。 You can use an alternate constructor for std::string if you want the string to contain the entire character array. 如果希望字符串包含整个字符数组,则可以为std::string使用备用构造函数。 But printing that string would still give you only "ab". 但是打印该字符串仍然只会给您“ ab”。 This decision to represent C strings in this manner is one of those arbitrary decisions that we are stuck with. 以这种方式表示C字符串的决定是我们坚持的那些任意决定之一。

0x00 is a non print character, 0..0x20, are all non print char s although some serve as line breaks. 0x00是非打印字符,0..0x20都是非打印char尽管有些用作换行符。 0x00 serves to terminate a string. 0x00用于终止字符串。

What do you want to be substituted (and printed) for 0x00 in the resulting string? 您要在结果字符串中用0x00代替(并打印)什么?

The constructor is responsible for conversion of char[] into a string. 构造函数负责将char []转换为字符串。 As others pointed out, you must use different constructor. 正如其他人指出的,您必须使用其他构造函数。 The code below is working for me, but it is not very roboust. 下面的代码为我工作,但不是很强大。 The first parameter must be a pointer to the array (you are free to use safer casting) and the second parameter is the length of the array (you are free to calculate this in a more sophisticated way). 第一个参数必须是指向数组的指针(您可以自由使用更安全的转换),第二个参数是数组的长度(您可以自由地以更复杂的方式进行计算)。

#include <iostream>
int main() {
  unsigned char cbuffer[]={0x61,0x00,0x62,0x63,0x00};
  std::string sbuffer((char *)cbuffer,5);
  std::cout << sbuffer << std::endl;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM