如何在C / C ++中将非ASCII字符注入字符串文字

Question

I have a program that reads in a character array. 我有一个读取字符数组的程序。 I need the value of the string in memory to be equal to hex 0x01020304 which are all non-ASCII characters. 我需要内存中字符串的值等于所有非ASCII字符的十六进制0x01020304。 So the question is, how do I pass non-ASCII characters into a string literal variable at runtime? 所以问题是，如何在运行时将非ASCII字符传递到字符串文字变量中？

Answer 1

Use an escape sequence. 使用转义序列。 Make sure you put the characters in the correct order. 确保按正确的顺序放置字符。

"\x01\x02\x03\x04"

Edit: If you need to put the sequence into an existing char array, simply assign it in. 编辑：如果需要将序列放入现有的char数组中，只需将其分配即可。

char s[4];

// ... later ...
s[0] = 0x01;
s[1] = 0x02;
s[2] = 0x03;
s[3] = 0x04;

Do not attempt to assign the number by casting s to (int32_t *) , the char array doesn't have the correct alignment. 不要尝试通过将s强制转换为(int32_t *)来分配数字，char数组的对齐方式不正确。

Answer 2

Probably the easiest, in C, is to use the hex escape notation: "\\x01\\x02\\x03\\x04" . 在C语言中，最简单的方法可能是使用十六进制转义符号： "\\x01\\x02\\x03\\x04" 。 (Without the x, the values are in octal, which isn't nearly as popular or understandable nowadays.) （没有x的值是八进制的，这在当今并不流行或不易理解。）

Alternatively, 或者，

char x[] = {1, 2, 3, 4, 0};

should work (notice that the null termination has to be included when initializing like this). 应该起作用（注意，像这样初始化时必须包含空终止符）。

Answer 3

I need the value of the string in memory to be equal to hex 0x01020304 which are all non-ASCII characters. 我需要内存中字符串的值等于所有非ASCII字符的十六进制0x01020304。

beware How 4 contigious bytes are laid out in memory will depend if your system is big-endian or little-endian. 请注意，如何在内存中安排4个连续的字节，这取决于您的系统是big-endian还是little-endian。 If you care about how the 32 bit field works, just putting things into a string literal won't work. 如果您关心32位字段的工作方式，则仅将内容放入字符串文字中是行不通的。

For example: 例如：

You could try, as avakar suggests: 您可以尝试使用avakar建议的方法：

char cString[5] = "\x01\x02\x03\x04";

or even just do 甚至只是做

cString[0] = 0x01;
cString[1] = 0x02;
...

but if you expect the actual physical layout in memory to make sense: 但是如果您希望内存中的实际物理布局有意义：

// assuming unsigned int is 32 bits
unsigned int* cStringAlias = rentirpret_cast<int*>(&cString[0]);
std::cout << (*cStringAlias)

Be careful , the output will differ depending on whether the most significant byte is placed in the 0th location or the 3rd location. 请注意 ，根据最高有效字节是放置在第0位还是第3位，输出将有所不同。

The output could be 输出可能是

0x01020304

or 要么

0x04030201

For more, read about endianess . 有关更多信息，请阅读有关耐力的信息。

Answer 4

Well, are you sure you need a string literal? 好吧，您确定需要字符串文字吗？

These are all pretty similar: 这些都很相似：

const char* blah = "test";
char blah[] = "test";
char blah[] = { 't','e','s','t',0 };

You could certainly use the third form for your needs quite easily. 当然，您可以很轻松地使用第三种形式来满足您的需求。

Answer 5

Since you are talking about injection, I 'll give you a clue (This is useful for a code injection that exploits a buffer overflow vulnerability, for academic purposes)... You have to configure your terminal to accept unicode (in my mac you could write them by default). 由于您正在谈论注入，因此我将为您提供一个线索（这对于出于学术目的利用缓冲区溢出漏洞的代码注入很有用）...您必须将终端配置为接受unicode（在我的Mac中，可以默认写入）。 So you write for instance things like ∫, when you enter unicode character, it does not take just one byte in memory like a regular char, it will take more bytes (can be two, three or four bytes) , so if you have an array 因此，您编写了例如∫这样的东西，当您输入unicode字符时，它不像常规char那样仅占用一个字节的内存，它将占用更多字节（可以是2、3或4个字节），因此如果您有一个数组

char v[4];

and if you use 如果您使用

gets(v); //insecure function to read

and enter this ∫ the 4 bytes that takes v in memory will be filled with this values (in decimal): 并输入∫，将占用v的4个字节用以下值填充（十进制）：

-30
-120
-85
0

If you see any of those single positions, none of them are printable ASCII, that could be some code you could get into memory and make the program execute it by hacking it changing a return dir in the stack also by exploiting the same buffer overflow vulnerability that allows gets(). 如果看到这些位置中的任何一个，则它们都不是可打印的ASCII，这可能是一些代码，您可以进入内存并通过黑客修改堆栈中的返回目录来使程序执行它，也可以利用相同的缓冲区溢出漏洞允许gets（）。 (to get the code open your program in a HEX editor to see how everything looks when it is compiled )! （要获取代码，请在HEX编辑器中打开程序，以查看编译时的外观）！

So you just have to find the right unicode characters that match with what you need by printing in a file 因此，您只需通过在文件中打印来找到与所需字符相匹配的正确unicode字符

In this link anyone can get the idea of how memory is allocated in the stack http://eli.thegreenplace.net/2011/02/04/where-the-top-of-the-stack-is-on-x86/ 在此链接中，任何人都可以了解如何在堆栈中分配内存http://eli.thegreenplace.net/2011/02/04/ where- the-top-of-the-stack-is-on- x86 /

(it seems that @Ben does not even have an account anymore, but for anyone that is learning secure programming that needs it ) （似乎@Ben甚至都没有帐户，但是对于正在学习需要它的安全编程的任何人而言）

Answer 6

Save the source in UTF8 and treat all strings as UTF-8 (or use something line StringFromUTF()). 将源代码保存在UTF8中，并将所有字符串都视为UTF-8（或使用StringFromUTF（）行）。

Each time you don't work in an universal code page (yes, UTF-8 is not really a code page...) you are asking for troubles. 每次您不在通用代码页中工作时（是的，UTF-8并不是真正的代码页...），您都会遇到麻烦。

Answer 7

When writing C code, you can use memcpy() to copy binary data: 编写C代码时，可以使用memcpy（）复制二进制数据：

memcpy(dest + offset, src, 4);

If src is a string, you presumably get it in the right order. 如果src是字符串，则大概以正确的顺序获取它。 If it's an integer (say, uint32_t) and you need a specific endianness, you might need to reverse the order of the bytes before doing memcpy() : 如果它是整数（例如uint32_t），并且需要特定的字节序，则可能需要在执行memcpy（）之前反转字节的顺序：

uint32_t src;

...

swap((unsigned char *) &src, 0, 3);
swap((unsigned char *) &src, 1, 2);

where swap() is defined by you. 其中swap（）由您定义。 You must do this only if the machine endianness doesn't match the desired output endianness. 仅当计算机字节序与所需的输出字节序不匹配时，才必须执行此操作。

You can discover the endianness by looking at certain defines set by the compiler or C library. 您可以通过查看由编译器或C库设置的某些定义来发现字节序。 At least on glibc (Linux), endian.h provides such definitions, and byteswap.h also provides byte-swapping functions. 至少在glibc（Linux）上， endian.h提供了这样的定义， byteswap.h也提供了字节交换功能。

Answer 8

You may want to try using std::hex : 您可能要尝试使用std::hex ：

int temp;
char sentMessage[10];
        for(int i = 0; i < 10; ++i)
        {
            std::cin >> std::hex >> temp;
            sentMessage[i] = temp;   
        }

You would then type in the hexadecimal value of each character, eg. 然后，您将键入每个字符的十六进制值，例如。 01 11 7F AA 01 11 7F AA

Answer 9

You can use std::wcin and std::wcout for unicode support for console. 您可以将std::wcin和std::wcout用于控制台的unicode支持。 However, I am not sure whether they are part of the standard. 但是，我不确定它们是否是标准的一部分。

如何在C / C ++中将非ASCII字符注入字符串文字

问题描述

9 个解决方案

解决方案1
17 2009-06-08 18:02:37

解决方案2
4 2009-06-08 18:04:51

解决方案3
3 2009-06-08 18:05:01

解决方案4
2 2009-06-08 18:04:08

解决方案5
1 2016-03-08 04:28:32

解决方案6
1 2009-06-08 18:13:26

解决方案7
1 2009-06-08 18:26:16

解决方案8
0 2009-06-08 18:13:45

解决方案9
0 2009-06-08 18:17:49

如何在C / C ++中将非ASCII字符注入字符串文字

问题描述

9 个解决方案

解决方案1 17 2009-06-08 18:02:37

解决方案2 4 2009-06-08 18:04:51

解决方案3 3 2009-06-08 18:05:01

解决方案4 2 2009-06-08 18:04:08

解决方案5 1 2016-03-08 04:28:32

解决方案6 1 2009-06-08 18:13:26

解决方案7 1 2009-06-08 18:26:16

解决方案8 0 2009-06-08 18:13:45

解决方案9 0 2009-06-08 18:17:49

解决方案1
17 2009-06-08 18:02:37

解决方案2
4 2009-06-08 18:04:51

解决方案3
3 2009-06-08 18:05:01

解决方案4
2 2009-06-08 18:04:08

解决方案5
1 2016-03-08 04:28:32

解决方案6
1 2009-06-08 18:13:26

解决方案7
1 2009-06-08 18:26:16

解决方案8
0 2009-06-08 18:13:45

解决方案9
0 2009-06-08 18:17:49