简体   繁体   English

C 字符串中的特殊字符和不可打印的 ASCII 的奇怪之处

[英]Strangeness with special characters in C-strings and unprintable ASCII

I need to copy a c++ string into a char array and then decode it.我需要将 c++ 字符串复制到 char 数组中,然后对其进行解码。 The char array does not need to be null terminated. char 数组不需要被 null 终止。 Due to the encoded nature, many of the characters are unusual, and some are non-printable, and this is causing issues.由于编码的性质,许多字符是不寻常的,有些是不可打印的,这会导致问题。

This is what the C++ string prints as: std::cout << myString;这是 C++ 字符串打印为: std::cout << myString; Output: Output:

mw\22ypwr\`himg 0few1nvnl

This is converted into a char [] by doing the following:通过执行以下操作将其转换为char []

char * m = new char[myString.size() + 1];
strcpy(m, myString.c_str());

m* has a length of 24, and is not correct. m*的长度为 24,不正确。 It fails to properly decode.它无法正确解码。 The following char [] does decode properly:以下char []可以正确解码:

char m2 [] = "mw\22ypwr`himg 0few1nvnl";

Note that this is created by copying the output of the string.请注意,这是通过复制字符串的 output 创建的。 However, the length of this c-string is just 22, not 24. Furthermore, printing it has the following result:但是,这个 c 字符串的长度只有 22,而不是 24。此外,打印它有以下结果:

std::cout << m;

Output: Output:

mwypwr`himg 0few1nvnl

Note that the \22 is gone.请注意, \22消失了。 However, it's not as simple as removing that from the string before converting it to a char[] .但是,它并不像在将其转换为char[]之前从字符串中删除它那么简单。 Iterating through the ASCII values shows that there is a character with the decimal opcode of 18 , where the \22 used to be.遍历 ASCII 值显示有一个十进制操作码为18的字符,其中\22曾经是。 This character does not print.该字符不打印。

ASCII values as decimal: ASCII 值作为十进制:

109 119 18 121 112 119 114 96 104 105 109 103 32 48 102 101 119 49 110 118 110 108 

Why does the \22 get converted into ASCII character 18?为什么\22被转换为 ASCII 字符 18? How can I construct the proper, de-codable C-string from the C++ string that has the literal \22 ?如何从具有文字\22的 C++ 字符串构造正确的、可解码的 C 字符串? I need to be able to do this for a large list of potentially unknown encoded strings, so I would prefer not to manually replace \22 with ASCII 18 without at least knowing why this happens.我需要能够为大量可能未知的编码字符串执行此操作,因此我不希望在不知道为什么会发生这种情况的情况下手动将\22替换为 ASCII 18。

The character string contains escape sequences that denote octal characters .字符串包含表示八进制字符的转义序列。

"mw\22ypwr\...other characters..."

The \22 is octal for decimal 18, thus the output you're seeing when you display the numeric version of each character. \22是十进制 18 的八进制,因此当您显示每个字符的数字版本时,您会看到 output。

if the c++ string s not zero terminated then this wont work如果 c++ 字符串不是零终止的,那么这将不起作用

strcpy(m, myString.c_str());

strcpy copies till zero encountered, use memcpy instead strcpy 复制到零,请改用 memcpy

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM