简体   繁体   中英

How python converts raw string to hex?

I have some raw string which I'm converting to hex

>>> word_str = "4954640000005200000005a7a90fb36ecd3fa2ca7ec48ca36004acef63f77157ab2f53e3f768ecd9e18547b8c22e21d01bfb6b3de325a27b8fb3acef63f77157ab2f53e3f768ecd9e185b7330fb7c95782fc3d67e7c3a66728dad8b59848c7670c94b29b54d2379e2e7a"

>>> hex_str = word_str.decode('hex')
>>> hex_str = "ITd\x00\x00\x00R\x00\x00\x00\x05\xa7\xa9\x0f\xb3n\xcd?\xa2\xca~\xc4\x8c\xa3`\x04\xac\xefc\xf7qW\xab/S\xe3\xf7h\xec\xd9\xe1\x85G\xb8\xc2.!\xd0\x1b\xfbk=\xe3%\xa2{\x8f\xb3\xac\xefc\xf7qW\xab/S\xe3\xf7h\xec\xd9\xe1\x85\xb73\x0f\xb7\xc9W\x82\xfc=g\xe7\xc3\xa6g(\xda\xd8\xb5\x98H\xc7g\x0c\x94\xb2\x9bT\xd27\x9e.z"

By looking at ascii table I suppose that it takes two numbers at a time and converts them by appropriate value from ascii table like

49 -> I
54 -> T 
64 -> d
00 -> \x00  
00 -> \x00

But at some point this rule breaks

52 -> \x00R (00 and 52)

Then is proceeds to take two numbers at a time and

00 -> \x00 
00 -> \x00 
00 -> \x00
05 -> \x05 
a7 -> \xa7 
a9 -> \xa9 
0f -> \x0f 

Here it takes 2 pairs ( b3 and 63 ) at the same time instead of of one, wherein it doesn't convert b3 with appropriate value (from extended ascii table)

b36e -> \xb3n

Here cd becomes \\xcd? ...

 cd ->  \xcd?

My goal is to implement the same (variable.decode('hex')) in C++, but I need to understand what's going on, which algorithm here has been used ?

What you're asking about is the representation of the string for printing it in a human-readable format. The string itself contains the values of each byte in the original hex string (each byte being derived from two original digits).

Some of the bytes in your string are characters that aren't printable or aren't representable in ASCII. For those, Python uses an escape code: \\x followed by the the two original hex digits.

In your example b36e -> \\xb3n , Python converts the b3 to \\xb3 . The next byte, 6e , is ASCII for the lowercase n and since that's printable, it comes through verbatim. Python is not "taking them two at a time;" each byte is processed separately.

So basically, if you want to "do the same thing" in C++ then you would want to add all characters between 32 and 126 (inclusive) verbatim, and anything outside that range using the \\x escape.

I'm not sure you really want to do the same thing in C++ though; perhaps you can explain why you want to generate a Python string representation in C++.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM