简体   繁体   English

Java对String进行长编码/从长解码

[英]Java encoding/decoding a String to/from a long

I have a String that I'd like to encode into a long in Java; 我有一个想用Java语言编码的字符串。 I'd also like to decode it from a long back into a String. 我还想将其从很久以前解码为String。 It's important that it's a "long" (primitive) and not a Long (Object). 重要的是它是“ long”(原始)而不是Long(对象)。 The String can be of max length of 128 characters but it's generally much smaller. 字符串的最大长度为128个字符,但通常要小得多。 The String's characters are encoded in ASCII and only using the standard ASCII values (0-127) and not the extended ASCII codes (0-256). 字符串的字符以ASCII编码,并且仅使用标准ASCII值(0-127),而不使用扩展的ASCII码(0-256)。

I am able to encode a String of length 8 by just converting each char into each byte of a long (8 bytes). 通过将每个char转换为long(8个字节)的每个字节,我就能对长度为8的字符串进行编码。 Since the range of each char is of 0-127 (7 bits) I believe I can encode up to 9 characters in long (64 bits / 7 bits = 9.14) but I have yet to implement it. 由于每个字符的范围是0-127(7位),我相信我最多可以编码9个长字符(64位/ 7位= 9.14),但是我还没有实现它。

I have a feeling that it may be impossible (to encode all 128 characters) but I wanted to open up the problem and see if there is a better technique. 我感觉不可能(对所有128个字符进行编码),但是我想提出一个问题,看看是否有更好的技术。

If 128 characters is impossible, what is the maximum number of characters you can encode into a long? 如果不可能使用128个字符,那么一个长整数可以编码的最大字符数是多少?

PS I've also looked into hashing a bit but it seems like it fails on the decoding requirement of the question. PS我也研究了一些散列,但似乎无法满足该问题的解码要求。

I believe Shannon's source coding theorem can be used to determine how much data can be compressed into 64 bits. 我相信Shannon的源编码定理可以用来确定可以将多少数据压缩为64位。

You'd need to achieve a 14:1 compression ratio which is possible, however it is highly dependent on your data-set. 您可能需要达到14:1的压缩率,但这很大程度上取决于您的数据集。 For example, you could compress 896 bits (128 characters) to 64 bits, if your input string happened to be a single character repeated 128 times. 例如,如果您输入的字符串恰巧是单个字符重复128次,则可以将896位(128个字符)压缩为64位。 I suspect it's provably impossible to achieve this compression ratio for all strings of 128 characters. 我怀疑对于所有 128个字符的字符串无法达到此压缩率。

Take a look at a somewhat related question: What is the maximum compression ratio of gzip? 看一个相关的问题: gzip的最大压缩率是多少? .

Also, you might get better answers on cs.stackexchange.com since this is more of a theory question than a programming question. 另外,您可能会在cs.stackexchange.com上获得更好的答案,因为这更多是理论问题而不是编程问题。

Without compression you can represent 12 characters at 5 bits a character in a 64 bit long. 如果不进行压缩,则可以以64位长的5位字符表示12个字符。 That gives you 32 possible code points in your encoding 26 for alpha and 6 left over. 这样就可以在编码26中为32个剩余的alpha和6个编码点提供32个可能的代码点。 For 7 bit ASCII you can only fit 9 characters. 对于7位ASCII,您只能容纳9个字符。

Doing 128 characters in 64-bits is impossible in general (specific cases maybe with compression), given with 64 bits the best you can do is represent 64 characters if you limit your encoding to 2 code points and represent them as bits. 通常,在64位中无法处理128个字符(在某些特定情况下,可能会进行压缩),如果将64位编码限制为2个代码点并将其表示为位,则最好使用64位字符。

Compression may be able to pull it of for certain strings, but not generally for all possible Strings of 128 characters. 对于某些字符串,压缩可能可以将其提取出来,但对于所有可能的128个字符的字符串,通常不能提取压缩。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM