简体   繁体   中英

Java encoding/decoding a String to/from a long

I have a String that I'd like to encode into a long in Java; I'd also like to decode it from a long back into a String. It's important that it's a "long" (primitive) and not a Long (Object). The String can be of max length of 128 characters but it's generally much smaller. The String's characters are encoded in ASCII and only using the standard ASCII values (0-127) and not the extended ASCII codes (0-256).

I am able to encode a String of length 8 by just converting each char into each byte of a long (8 bytes). Since the range of each char is of 0-127 (7 bits) I believe I can encode up to 9 characters in long (64 bits / 7 bits = 9.14) but I have yet to implement it.

I have a feeling that it may be impossible (to encode all 128 characters) but I wanted to open up the problem and see if there is a better technique.

If 128 characters is impossible, what is the maximum number of characters you can encode into a long?

PS I've also looked into hashing a bit but it seems like it fails on the decoding requirement of the question.

I believe Shannon's source coding theorem can be used to determine how much data can be compressed into 64 bits.

You'd need to achieve a 14:1 compression ratio which is possible, however it is highly dependent on your data-set. For example, you could compress 896 bits (128 characters) to 64 bits, if your input string happened to be a single character repeated 128 times. I suspect it's provably impossible to achieve this compression ratio for all strings of 128 characters.

Take a look at a somewhat related question: What is the maximum compression ratio of gzip? .

Also, you might get better answers on cs.stackexchange.com since this is more of a theory question than a programming question.

Without compression you can represent 12 characters at 5 bits a character in a 64 bit long. That gives you 32 possible code points in your encoding 26 for alpha and 6 left over. For 7 bit ASCII you can only fit 9 characters.

Doing 128 characters in 64-bits is impossible in general (specific cases maybe with compression), given with 64 bits the best you can do is represent 64 characters if you limit your encoding to 2 code points and represent them as bits.

Compression may be able to pull it of for certain strings, but not generally for all possible Strings of 128 characters.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM