简体   繁体   中英

How to inject non-ASCII characters into a string literal in C/C++

I have a program that reads in a character array. I need the value of the string in memory to be equal to hex 0x01020304 which are all non-ASCII characters. So the question is, how do I pass non-ASCII characters into a string literal variable at runtime?

Use an escape sequence. Make sure you put the characters in the correct order.

"\x01\x02\x03\x04"

Edit: If you need to put the sequence into an existing char array, simply assign it in.

char s[4];

// ... later ...
s[0] = 0x01;
s[1] = 0x02;
s[2] = 0x03;
s[3] = 0x04;

Do not attempt to assign the number by casting s to (int32_t *) , the char array doesn't have the correct alignment.

Probably the easiest, in C, is to use the hex escape notation: "\\x01\\x02\\x03\\x04" . (Without the x, the values are in octal, which isn't nearly as popular or understandable nowadays.)

Alternatively,

char x[] = {1, 2, 3, 4, 0};

should work (notice that the null termination has to be included when initializing like this).

I need the value of the string in memory to be equal to hex 0x01020304 which are all non-ASCII characters.

beware How 4 contigious bytes are laid out in memory will depend if your system is big-endian or little-endian. If you care about how the 32 bit field works, just putting things into a string literal won't work.

For example:

You could try, as avakar suggests:

char cString[5] = "\x01\x02\x03\x04";

or even just do

cString[0] = 0x01;
cString[1] = 0x02;
...

but if you expect the actual physical layout in memory to make sense:

// assuming unsigned int is 32 bits
unsigned int* cStringAlias = rentirpret_cast<int*>(&cString[0]);
std::cout << (*cStringAlias)

Be careful , the output will differ depending on whether the most significant byte is placed in the 0th location or the 3rd location.

The output could be

0x01020304

or

0x04030201

For more, read about endianess .

Well, are you sure you need a string literal?

These are all pretty similar:

const char* blah = "test";
char blah[] = "test";
char blah[] = { 't','e','s','t',0 };

You could certainly use the third form for your needs quite easily.

Since you are talking about injection, I 'll give you a clue (This is useful for a code injection that exploits a buffer overflow vulnerability, for academic purposes)... You have to configure your terminal to accept unicode (in my mac you could write them by default). So you write for instance things like ∫, when you enter unicode character, it does not take just one byte in memory like a regular char, it will take more bytes (can be two, three or four bytes) , so if you have an array

char v[4];

and if you use

gets(v); //insecure function to read

and enter this ∫ the 4 bytes that takes v in memory will be filled with this values (in decimal):

-30
-120
-85
0

If you see any of those single positions, none of them are printable ASCII, that could be some code you could get into memory and make the program execute it by hacking it changing a return dir in the stack also by exploiting the same buffer overflow vulnerability that allows gets(). (to get the code open your program in a HEX editor to see how everything looks when it is compiled )!

So you just have to find the right unicode characters that match with what you need by printing in a file

In this link anyone can get the idea of how memory is allocated in the stack http://eli.thegreenplace.net/2011/02/04/where-the-top-of-the-stack-is-on-x86/

(it seems that @Ben does not even have an account anymore, but for anyone that is learning secure programming that needs it )

Save the source in UTF8 and treat all strings as UTF-8 (or use something line StringFromUTF()).

Each time you don't work in an universal code page (yes, UTF-8 is not really a code page...) you are asking for troubles.

When writing C code, you can use memcpy() to copy binary data:

memcpy(dest + offset, src, 4);

If src is a string, you presumably get it in the right order. If it's an integer (say, uint32_t) and you need a specific endianness, you might need to reverse the order of the bytes before doing memcpy() :

uint32_t src;

...

swap((unsigned char *) &src, 0, 3);
swap((unsigned char *) &src, 1, 2);

where swap() is defined by you. You must do this only if the machine endianness doesn't match the desired output endianness.

You can discover the endianness by looking at certain defines set by the compiler or C library. At least on glibc (Linux), endian.h provides such definitions, and byteswap.h also provides byte-swapping functions.

You may want to try using std::hex :

int temp;
char sentMessage[10];
        for(int i = 0; i < 10; ++i)
        {
            std::cin >> std::hex >> temp;
            sentMessage[i] = temp;   
        } 

You would then type in the hexadecimal value of each character, eg. 01 11 7F AA

You can use std::wcin and std::wcout for unicode support for console. However, I am not sure whether they are part of the standard.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM