简体   繁体   English

可逆字符串压缩PHP / C ++

[英]Reversible string compression PHP/C++

I would like to obfuscate some short text data, and make this compression learnable/memorizable. 我想混淆一些短文本数据,并使此压缩易于学习/记忆。

So I'm looking for an algorithm achievable in PHP to compress a string (~25 characters long) into a ~8 character string, then reversible in C++. 因此,我正在寻找一种可在PHP中实现的算法,该算法将一个字符串(长约25个字符)压缩为一个约8个字符串,然后在C ++中可逆。

Does anyone have an algo name or another idea ? 有人有算法名称或其他想法吗?

EDIT : Everything is lowcase, with two specials characters. 编辑 :一切都是小写,有两个特殊字符。

As the text data only consists of lowercase letters and two special characters, so we have only 28 different characters to consider. 由于文本数据仅包含小写字母和两个特殊字符,因此我们只需要考虑28个不同的字符。

We can design the hash function by property of bit representation. 我们可以通过位表示的属性来设计哈希函数。 With all possible combination of 5 bits, we can uniquely represent 32 different symbols. 使用5位的所有可能组合,我们可以唯一表示32个不同的符号。 So, to represent 28 different symbols, we only need 5 bits for each symbol. 因此,要表示28个不同的符号,每个符号只需要5位。

a => 00000
b => 00001
c => 00010
......
......
......
y => 11000
z => 11001
special-character-1 => 11010
special-character-2 => 11011

With this encoding scheme, we only need 25 * 5 = 125 bits to represent the complete text data, which is 125 / 8 ~ 16 bytes or 16 characters (sorry its not 8 characters). 使用这种编码方案,我们只需要25 * 5 = 125位即可表示完整的文本数据,即125/8〜16个字节或16个字符(抱歉,不是8个字符)。

Now, you can retrieve the actual string from this 16 characters hash by applying the reverse mapping. 现在,您可以通过应用反向映射从这16个字符的哈希中检索实际的字符串。

If you're satisfied with 16 characters reversible hashing, I can provide C++ implementation. 如果您对16个字符的可逆哈希感到满意,我可以提供C ++实现。

Impossible. 不可能。

If we assume that the original strings only contains letters AZ, there are 26 25 ≈ 4.25 x 10 37 (42 billion billion billion billion) possible input strings. 如果我们假设原始字符串只包含字母AZ,有26 25≈4.25×10 37(42十亿十亿十亿十亿)可能的输入字符串。

If we then generously allow the eight-character outputs to contain any letter, uppercase or lowercase, or digit (26 + 26 + 10 = 62 characters total), there are 62 8 ≈ 2.18 x 10 14 (218 million billion) possible outputs. 如果我们然后慷慨允许的八字输出到包含任何字母,大写或小写,或数字(26 + 26 + 10 = 62个字符),有62个8≈2.18×10 14(2.18亿十亿)可能的输出。

This is roughly 10 23 times fewer! 这大约少了10 23倍! By the pigeonhole principle , the compression scheme you're asking for is impossible -- there are many times more possible input strings than outputs, so there's no way to reversibly turn every one of the input strings into an output and back. 根据信鸽原则 ,您所要求的压缩方案是不可能的-输入字符串的数目比输出数目大很多 ,因此无法将每个输入字符串可逆地转换为输出并返回。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM