简短，不区分大小写的字符串混淆策略

Question

I am looking for a way to identify (ie encode and decode) a set of Java strings with one token. 我正在寻找一种方法来识别（即编码和解码）一组带有一个令牌的Java字符串。 The identification should not involve DB persistence. 标识不应涉及DB持久性。 So far I have looked into Base64 encoding and DES encryption, but both are not optimal with respect to the following requirements: 到目前为止，我已经研究过Base64编码和DES加密，但就以下要求而言，两者都不是最佳的：

Token should be as short as possible 令牌应尽可能短
Token should be insensitive to casing 令牌应该对套管不敏感
Token should survive a URLEncoder/Decoder round-trip (ie will be used in URLs) 令牌应该在URLEncoder / Decoder往返中存活（即将在URL中使用）

Is Base32 my best shot or are there better options? Base32是我最好的投手还是有更好的选择？ Note that I'm primarily interested in shortening & obfuscating the set, encryption/security is not important. 请注意，我主要对缩短和混淆集合感兴趣，加密/安全性并不重要。

Answer 1

What's a structure of the text (ie set of strings)? 什么是文本的结构（即字符串集）？ You could use your knowledge of it to encode it in a shorten form. 您可以使用它的知识以缩短的形式对其进行编码。 Eg if you have large base-decimal number "1234567890" you could translate it into 36-base number, which will be shorter. 例如，如果您有大的基数十进制数“1234567890”，您可以将其转换为36个基数，这将更短。

Otherwise it looks like you are trying invent an universal archiver. 否则看起来你正在尝试发明一个通用归档器。

If you don't care about length, then yes, processing by alphabet based encoder (such as Base32) is the only choice. 如果您不关心长度，那么是的，基于字母的编码器（例如Base32）处理是唯一的选择。

Also, if text is large enough, maybe you could save some space by gzipping it. 此外，如果文本足够大，也许你可以通过gzipping来节省一些空间。

Answer 2

Rot13 obfuscates but does not shorten. Rot13会混淆，但不会缩短。 Zip shortens (usually) but does not survive the URL round trip. Zip缩短（通常）但不会在URL往返中存活。 Encryption will not shorten, and may lengthen. 加密不会缩短，也可能会延长。 Hashing shortens but is one-way. 哈希缩短但是单向。 You do not have an easy problem. 你没有一个简单的问题。 Base32 is case insensitive, but takes more space than Base64, which isn't. Base32不区分大小写，但比Base64占用更多空间，而不是。 I suspect that you are going to have to drop or modify your requirements. 我怀疑你将不得不放弃或修改你的要求。 Which requirements are most important and which least important? 哪些要求最重要哪个最不重要？

Answer 3

I have spent some time on this and I have a good solution for you. 我花了一些时间在这上面，我有一个很好的解决方案。

Encode as base64 then as a custom base32 that uses 0-9a-v. 编码为base64，然后编码为使用0-9a-v的自定义base32。 Essentially, you lay out the bits 6 at a time (your chars are 0-9a-zA-Z) then encode them 5 at a time. 基本上，你一次布局6位（你的字符是0-9a-zA-Z），然后一次编码5。 This leads to hardly any extra space. 这几乎不会产生任何额外的空间。 For example, ABCXYZdefxyz123789 encodes as i9crnsuj9ov1h8o4433i14 例如， ABCXYZdefxyz123789编码为i9crnsuj9ov1h8o4433i14

Here's an implementation that works, including some test code that proves it is case-insensitive: 这是一个有效的实现，包括一些证明它不区分大小写的测试代码：

// Note: You can add 1 more char to this if you want to
static String chars = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";

private static String decodeToken(String encoded) {
    // Lay out the bits 5 at a time
    StringBuilder sb = new StringBuilder();
    for (byte b : encoded.toLowerCase().getBytes())
        sb.append(asBits(chars.indexOf(b), 5));

    sb.setLength(sb.length() - (sb.length() % 6));

    // Consume it 6 bits at a time
    int length = sb.length();
    StringBuilder result = new StringBuilder();
    for (int i = 0; i < length; i += 6)
        result.append(chars.charAt(Integer.parseInt(sb.substring(i, i + 6), 2)));

    return result.toString();
}

private static String generateToken(String x) {
    StringBuilder sb = new StringBuilder();
    for (byte b : x.getBytes())
        sb.append(asBits(chars.indexOf(b), 6));

    // Round up to 5 bit multiple
    // Consume it 5 bits at a time
    int length = sb.length();
    sb.append("00000".substring(0, length % 5));
    StringBuilder result = new StringBuilder();
    for (int i = 0; i < length; i += 5)
        result.append(chars.charAt(Integer.parseInt(sb.substring(i, i + 5), 2)));

    return result.toString();
}

private static String asBits(int index, int width) {
    String bits = "000000" + Integer.toBinaryString(index);
    return bits.substring(bits.length() - width);
}

public static void main(String[] args) {
    String input = "ABCXYZdefxyz123789";
    String token = generateToken(input);
    System.out.println(input + " ==> " + token);
    Assert.assertEquals("mixed", input, decodeToken(token));
    Assert.assertEquals("lower", input, decodeToken(token.toLowerCase()));
    Assert.assertEquals("upper", input, decodeToken(token.toUpperCase()));
    System.out.println("pass");
}

简短，不区分大小写的字符串混淆策略

问题描述

3 个解决方案

解决方案1
2 已采纳 2011-11-15 10:48:11

解决方案2
2 2011-11-15 10:53:37

解决方案3
1 2011-11-18 09:35:45

简短，不区分大小写的字符串混淆策略

问题描述

3 个解决方案

解决方案1 2 已采纳 2011-11-15 10:48:11

解决方案2 2 2011-11-15 10:53:37

解决方案3 1 2011-11-18 09:35:45

解决方案1
2 已采纳 2011-11-15 10:48:11

解决方案2
2 2011-11-15 10:53:37

解决方案3
1 2011-11-18 09:35:45