简体   繁体   中英

How do I encode a 4-byte string as a single 32-bit integer?

First, a disclaimer. I'm not a CS grad nor a math major, so simplicity is important.

I have a four-character string (eg "isoy") that I need to pass as a single 32-bit integer field. Of course at the other end, I need to decode it back to a string. The string will only contain AZ, and case is not important, if that helps.

The funny part is that I'm starting with PowerShell on the sending end and Linux at the receiving end. I can use Perl or Python there, with a preference for Python. I don't actually need answers in each language, I'm most interested in a PowerShell (C# also good) example for going both ways.

To 32-bit unsigned integer:

uint x = BitConverter.ToUInt32(Encoding.ASCII.GetBytes("isoy"), 0); // 2037347177

To string:

string s = Encoding.ASCII.GetString(BitConverter.GetBytes(x));      // "isoy"

BitConverter uses the native endianness of the machine.

For Python, struct.unpack does the job (to make a 4-byte string into an int -- struct.pack goes the other way):

>>> import struct
>>> struct.unpack('i', 'isoy')[0]
2037347177
>>> struct.pack('i', 2037347177)
'isoy'
>>> 

(you can use different formats to ensure big-endian or little-endian encoding, if you need that -- '>i' and '<i' respectively -- instead of just plain 'i' which uses whatever encoding is native to the machine).

// string -> int    

uint ret = 0;
for ( int i = 0; i < 4; ++i )
{
  ret |= ( str[i] << ( i * 8 ) );
}

// int -> string
for ( int i = 0; i < 4; ++i )
{
  str[i] = ( ret >> ( i * 8 ) ) & 0xff;
}

Using PowerShell syntax you can do it this way (pretty much like dtb solution):

PS> $x = [BitConverter]::ToUInt32([byte[]][char[]]'isoy', 0)
PS> [char[]][BitConverter]::GetBytes($x) -join ''
isoy

You do have to watch out for endian-ness on the Linux side. If it is running on an Intel processor I believe should be fine (same endian-ness as the PowerShell side).

Please take a look at the struct standard library module in Python's Manual. It has two functions for this: struct.pack and struct.unpack . You can use the 'L' (unsigned long) format character for this.

Aside from byte packing, you can also consider that your 26-character alphabet can be encoded as 0-25 instead of AZ.

So without worrying about big and little endians, you can go from "letters" to a number like this:

val=letter0+letter1*26+letter2*26*26+letter3*26*26*26;

to go from val back to letters, you do something like this:

letter0=val%26;
letter1=(val/26)%26;
letter2=(val/(26*26))%26;
letter3=(val/(26*26*26))%26;

where "%" is your language's modulus operator and "/" is an integer division.

You'll obviously need a way to get from 'A'-'Z' to 0-25 and back. That's language dependent.

You can easily put this into loops. I show the loops unrolled to make things a bit more obvious.

It's more common to pack letters into bytes, so you can use shift and and bitwise operations to encode and decode. But by doing it the way I show above, you could pack six letters into a 32-bit number, rather than just four. Which is nice, since you can hold things like stock market ticker symbols in a single 32-bit value (mutual funds ticker symbols are 5 characters).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM