简体   繁体   English

用Java将UUID编码为15个字符串

[英]Encoding a UUID into a 15 character string in Java

I've seen several questions around here that are similar but none are quite what I need. 我在这里看到了几个类似的问题,但是没有一个是我真正需要的。 For reasons that unfortunately cannot be changed, I need to take a java UUID and store it in a 15 character long string. 不幸的是,由于无法更改的原因,我需要使用一个Java UUID并将其存储在15个字符长的字符串中。 All of the numerical change of base methods that i have found can only reduce it to at best 22 characters, but I think it should be possible to make it shorter than that. 我发现基本方法的所有数字变化最多只能将其减少到22个字符,但我认为应该可以使它更短。 Does anyone know how this might could be done? 有谁知道该怎么做? The shorter the string the better. 字符串越短越好。 Thanks! 谢谢!

A UUID consists of 128 bits. UUID由128位组成。 That can be stored in a java String of 15 chars, as a java char is 16 bit, containing a UTF-16 char. 可以将其存储在15个字符的Java字符串中,因为Java字符为16位,包含UTF-16字符。 Not all 16 bit values can be taken, and for higher Unicode values some chars must come in pairs. 并非所有16位值都可以采用,对于更高的Unicode值,某些字符必须成对出现。 But we only need 9 bit payload per char (15 chars * 9 bit payload >= 128 bits). 但是我们每个字符只需要9位有效负载(15个字符* 9位有效负载> = 128位)。

So we can store a 9 bit payload per char, say from U+2000 onwards. 因此,我们可以为每个字符存储9位有效负载,例如从U + 2000开始。

public static String uuidToStr15(UUID uuid) {
    long[] longs = new long[2];
    longs[0] = uuid.getLeastSignificantBits();
    longs[1] = uuid.getMostSignificantBits();
    System.out.println("uuidToStr15: " + Arrays.toString(longs));

    char[] chars = new char[15];
    // 15 chars x 9 bits payload == 135 >=  128.
    final int bitsPerChar = (128 + chars.length - 1) / chars.length;
    final int char0 = 0x2000;
    long mask = (1L << bitsPerChar) - 1;
    for (int i = 0; i < chars.length; ++i) {
        int payload = (int)(longs[0] & mask);
        chars[i] = (char)(char0 + payload);
        longs[0] >>>= bitsPerChar;
        longs[0] |= (longs[1] & mask) << (64 - bitsPerChar);
        longs[1] >>>= bitsPerChar;
    }
    return new String(chars);
}

public static UUID str15ToUuid(String s) {
    char[] chars = s.toCharArray();
    if (chars.length != 15) {
        throw new IllegalArgumentException(
                "String should have length 15, not " + chars.length);
    }
    final int bitsPerChar = (128 + chars.length - 1) / chars.length;
    final int char0 = 0x2000;
    long mask = (1L << bitsPerChar) - 1;
    long[] longs = new long[2];
    //for (int i = 0; i < chars.length; ++i) {
    for (int i = chars.length - 1; i >= 0; --i) {
        int payload = (int) chars[i];
        if (payload < char0) {
            throw new IllegalArgumentException(
                     String.format("Char [%d] is wrong; U+%04X",
                         i, payload));
        }
        payload -= char0;
        longs[1] <<= bitsPerChar;
        longs[1] |= (longs[0] >>> (64 - bitsPerChar)) & mask;
        longs[0] <<= bitsPerChar;
        longs[0] |= payload;
    }
    System.out.println("str15ToUuid: " + Arrays.toString(longs));
    return new UUID(longs[1], longs[0]);
}

public static void main(String[] args) {
    UUID uuid = UUID.randomUUID();
    System.out.println("UUID; " + uuid.toString());
    String s = uuidToStr15(uuid);
    UUID uuid2 = str15ToUuid(s);
    System.out.println("Success: " + uuid2.equals(uuid));
}

Of course these strings are not easy to write down, or typed on a keyboard. 当然,这些字符串不容易写下或在键盘上键入。 For that one would need to be more careful, and pick ranges of Unicode code points. 为此,需要更加小心,并选择Unicode代码点的范围。

Also "15 chars" is precisely 30 bytes in UTF-16, but will have a longer physical size in UTF-8. 同样,“ 15个字符”在UTF-16中恰好是30个字节,但在UTF-8中将具有更长的物理大小。

From the Java Language Specification 3.10.5. 来自Java语言规范3.10.5。 String Literals (pay attention to the part in bold): 字符串文字 (注意粗体部分):

A string literal consists of zero or more characters enclosed in double quotes. 字符串文字包含零个或多个用双引号引起来的字符。 Characters may be represented by escape sequences (§3.10.6) - one escape sequence for characters in the range U+0000 to U+FFFF, two escape sequences for the UTF-16 surrogate code units of characters in the range U+010000 to U+10FFFF. 字符可以由转义序列(§3.10.6)表示-U + 0000到U + FFFF范围内的字符的一个转义序列,U + 010000到U + 010000范围内的字符的UTF-16替代代码单元的两个转义序列U + 10FFFF。 See §3.10.6 for the definition of EscapeSequence. 有关EscapeSequence的定义,请参见第3.10.6节。

A string literal is always of type String (§4.3.3). 字符串文字始终是String类型 (第4.3.3节)。

Every "character" in a Java String can be a UTF-16 value. Java字符串中的每个“字符”都可以是UTF-16值。 Meaning a String of length 15 can be up to 30 bytes. 表示长度为15的字符串最多可以包含30个字节。

Perhaps you are thinking that in Java, a character will map to a byte (a 8-bit value). 也许您认为在Java中,字符将映射到字节(8位值)。 But it is not. 但事实并非如此。

So, instead one would opt to use a byte[] array for encoding instead. 因此,取而代之的是选择使用byte []数组进行编码。 In fact, in real life, that is what we do when we want to encode things into 8-bit values (primitive bytes as understood in, say, C's unsigned char.) 实际上,在现实生活中,这就是我们想要将事物编码为8位值(例如C的无符号字符所理解的原始字节)时所要做的。

But then, let's do some math. 但是,让我们做一些数学运算。 By definition, a UUID is a 128-bit value . 根据定义, UUID是一个128位值 128-bit value is a sequence of 16 bytes ( 128 = 16 * 8 .) 128位值是16个字节的序列( 128 = 16 * 8

So, there is no way in hell that you can universally encode a UUID into 15 bytes. 因此,在地狱中没有办法将UUID普遍编码为15个字节。 UUID version 1 through 3 might contain redundant or repetitive values that could be compressed or ignored (assuming the reader can properly discern what those "dropped" values are.) UUID版本1到版本3可能包含冗余或重复的值,这些值可以被压缩或忽略(假设读者可以正确地识别出那些“丢弃”的值是什么。)

But once you use UUID v4 and v5, forget it. 但是一旦您使用了UUID v4和v5,就算了吧。 Those are pretty much a sequence of random values, pretty much uncompressable in the general case. 这些几乎是随机值的序列,在一般情况下几乎是不可压缩的。

Basic arithmetic then tell us that we should not try to do that :) 然后,基本算术告诉我们,我们不应该尝试这样做:)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM