简体繁体 English

使用基数40编码字符串有什么含义？

[英]What are the implications of using base 40 to encode a String?

原文 2012-09-09 14:20:26 6 1 java/ compression/ redis/ base64/ base32

I've seen it suggested that Base 40 encoding can be used to compress Strings (in Java to send to a Redis instance FWIW) and a quick test shows it more efficient for some of the data I'm using than an alternative I'm considering; 我已经看到它建议可以使用Base 40编码来压缩字符串（在Java中发送给Redis实例FWIW），并且快速测试表明，对于我正在使用的某些数据，它比其他方法更有效。考虑 Smaz. 真厉害

Is there any reason to prefer base 32 or 64 encoding over 40? 有什么理由比40更喜欢基数32或64编码？ Any disadvantage, is encoding like this potentially lossless? 有没有缺点，这样的编码是否可能无损？

1 个解决方案

40 provides letters (probably lower case unless your application tends to use upper case most of the time) and digits for 36, and then four more for punctuation and shifts. 40提供字母（除非您的应用程序大部分时间倾向于使用大写字母，否则可能是小写字母）和36的数字，然后再提供4个数字用于标点和移位。 You can make it lossless by making one of the remaining an escape so the next one or two characters represent a byte not in the other 39. Also a good approach is to have a shift-lock character that toggles between upper and lower case, if you tend to have strings of upper case characters. 您可以通过使其余字符之一转义以使其无损，以便接下来的一个或两个字符表示一个字节，而不是另一个39。另外，一种好的方法是使移位锁定字符在大写和小写之间切换，如果您倾向于使用大写字母字符串。

40 is a convenient base, since three base-40 digits fit nicely in two bytes. 40是一个方便的基数，因为3个基数40位很好地适合了两个字节。 40^3 (64000) is a smidge less than 2^16 (65536). 40 ^ 3（64000）是小于2 ^ 16（65536）的残骸。

What you should use depends on the statistics of your data. 您应该使用什么取决于数据的统计信息。