简体   繁体   English

如何将18个字符串转换为唯一ID?

[英]How to convert an 18 Character String into a Unique ID?

I have an 18 Character String that I need to convert into a unique long (in Java). 我有一个18字符串,我需要转换为一个独特的长(在Java中)。 A sample String would be: AAA2aNAAAAAAADnAAA 示例字符串将是:AAA2aNAAAAAAADnAAA

My String is actually an Oracle ROWID, so it can be broken down if needs be, see: http://download-uk.oracle.com/docs/cd/B19306_01/server.102/b14220/datatype.htm#CNCPT713 我的String实际上是一个Oracle ROWID,因此可以根据需要进行细分,请参阅: http//download-uk.oracle.com/docs/cd/B19306_01/server.102/b14220/datatype.htm#CNCPT713

The long number generated, (1) Must be unique, as no two results can point to the same database row and (2) Must be reversible, so I can get the ROWID String back from the long? 生成的长数字,(1)必须是唯一的,因为没有两个结果可以指向同一个数据库行,(2)必须是可逆的,所以我可以从长整数中获取ROWID字符串?

Any suggestions on an algorithm to use would be welcome. 有关使用算法的任何建议都将受到欢迎。

Oracle forum question on this from a few years ago : http://forums.oracle.com/forums/thread.jspa?messageID=1059740 几年前的Oracle论坛问题: http//forums.oracle.com/forums/thread.jspa?messageID = 1059740

Ro

You can't, with those requirements. 你不能满足这些要求。

18 characters of (assuming) upper and lower case letters has 56 18 or about 2.93348915 × 103 31 combinations. (假设)大写和小写字母的18个字符具有56 18或大约2.93348915×103 31个组合。 This is (way) more than the approximate 1.84467441 × 10 19 combinations available among 64 bits. 这是(方式)超过64位中可用的近似1.84467441×10 19组合。

UPDATE: I had the combinatorics wrong, heh. 更新:我的组合学错了,呵呵。 Same result though. 但结果相同。

Just create a map (dictionary / hashtable) that maps ROWID strings to an (incremented) long. 只需创建一个将ROWID字符串映射到(递增)long的映射(字典/散列表)。 If you keep two such dictionaries and wrap them up in a nice class, you will have a bidirectional lookup between the strings and the long IDs. 如果你保留两个这样的词典并将它们包装在一个很好的类中,你将在字符串和长ID之间进行双向查找。

Pseudocode: 伪代码:

class BidirectionalLookup:
    dict<string, long> stringToLong
    dict<long, string> longToString
    long lastId

    addString(string): long
        newId = atomic(++lastId)
        stringToLong[string] = newId
        longToString[newId] = string
        return newId

    lookUp(string): long
        return stringToLong[string]

    lookUp(long): string
        return longToString[long]

Your String of 18 characters representing a base 64 encoding represents a total of 108 bits of information, which is almost twice that of long's 64. We have a bit of a problem here if we want to represent every possible key and have the representation be reversible. 表示基本64位编码的18个字符的字符串表示总共108位信息,几乎是长64位的两倍。如果我们想要表示每个可能的键并且表示是可逆的,我们在这里有一点问题。

The string can be broken down into 4 numbers easily enough. 字符串可以很容易地分解为4个数字。 Each of those 4 numbers represents something - a block number, an offset in that block, whatever. 这4个数字中的每一个都代表一些东西 - 块号,该块中的偏移量,等等。 If you manage to establish upper limits on the underlying quantities such that you know larger numbers will not occur (ie if you find a way to identify at least 44 of those bits that will always be 0), then you can map the rest onto a long, reversibly. 如果您设法建立基础数量的上限,以便您知道不会出现更大的数字(即如果您找到一种方法来识别至少44个始终为0的位),那么您可以将其余的数据映射到长,可逆。

Another possibility would be to relax the requirement that the equivalent be a long . 另一种可能性是放宽对等long的要求。 How about a BigInteger ? BigInteger怎么样? That would make it easy. 这会让事情变得简单。

I'm assuming that's a case-insensitive alpha-numeric string, and so drawn from the set [a-zA-Z0-9]* 我假设这是一个不区分大小写的字母数字字符串,因此从集合中绘制[a-zA-Z0-9]*

In that case you have 在那种情况下你有

26 + 26 + 10 = 62 

possible values for each character. 每个字符的可能值。

62 < 64 = 2^6

In other words you need (at least) 6 bits to store each of the 18 characters of the key. 换句话说,您需要(至少)6位来存储密钥的18个字符中的每一个。

6 * 18 = 108 bits 

to store the entire string uniquely. 唯一地存储整个字符串。

108 bits  = (108 / 8) = 13.5 bytes.

Therefore as long as your data type can store at least 13.5 bytes then you can fairly simply define a mapping: 因此,只要您的数据类型可以存储至少13.5个字节,那么您可以相当简单地定义映射:

  1. Map from raw ASCII for each character to a representation using only 6 bits 从每个字符的原始ASCII映射到仅使用6位的表示
  2. Concatenate all 18 reduced representations to a sinlde 14 byte value 将所有18个简化表示连接到sinlde 14字节值
  3. Cast this to your final data value 将其转换为您的最终数据值

Obviously Java has nothing more than an 8 byte long . 显然,Java只有8个字节long So if you have to use a long then it is NOT possible to uniquely map the strings, unless there is something else which reduces the space of valid input strings. 所以,如果你必须使用一个long那么它是不可能的唯一映射字符串,除非有别的东西可降低有效输入字符串的空间。

Theoretically, you can't represent ROWID in a long (8 bytes). 从理论上讲,你不能用长(8字节)来表示ROWID。 However, depending on the size of your databases (the whole server, not only your table), you might be able to encode it into a long. 但是,根据数据库的大小(整个服务器,而不仅仅是您的表),您可以将其编码为long。

Here is the layout of ROWID, 这是ROWID的布局,

   OOOOOO-FFF-BBBBBB-RRR

Where O is ObjectID. 其中O是ObjectID。 F is FileNo. F是FileNo。 B is Block and R is Row Number. B是Block,R是行号。 All of them are Base64-encoded. 所有这些都是Base64编码的。 As you can see O & B can have 36-bits and B&R can have 18. 正如您所见,O&B可以有36位,B&R可以有18位。

If your database is not huge, you can use 2 byte for each part. 如果您的数据库不是很大,则每个部分可以使用2个字节。 Basically, your ObjectId and block number will be limited to 64K. 基本上,您的ObjectId和块编号将限制为64K。 Our DBA believes our database has to be several magnitude bigger for us to get close to these limits. 我们的DBA认为我们的数据库必须要大几倍才能接近这些限制。

I would suggest you find max of each part in your database and see if you are close. 我建议你找到数据库中每个部分的最大值,看看你是否接近。 I wouldn't use long if they are anywhere near the limit. 如果它们接近极限,我不会长时间使用。

Found a way to extract the ROWID in a different manner from the database.... 找到了一种从数据库中以不同方式提取ROWID的方法....

SQL> select DBMS_ ROWID.ROWID_ TO_RESTRICTED( ROWID, 1 ) FROM MYTABLE;

0000EDF4.0001.0000 0000EDF4.0002.0000 0000EDF4.0004.0000 0000EDF4.0005.0000 0000EDF4.0007.0000 0000EDF5.0000.0000 0000EDF5.0002.0000 0000EDF5.0003.0000

Then convert it to a number like so : 然后将其转换为如此数字:

final String hexNum = rowid.replaceAll( "\.", "" );
final long lowerValue = Long.parseLong( hexNum.substring( 1 ), 16 );
long upperNibble = Integer.parseInt( hexNum.substring( 0, 1 ), 16 );
if ( upperNibble >= 8 ) {
  //Catch Case where ROWID > 8F000000.0000.0000
  upperNibble -= 8;
  return -( 9223372036854775807L - ( lowerValue - 1 + ( upperNibble << 60 ) ) );
} else {
  return ( lowerValue + ( upperNibble << 60 ) );
}

Then reverse that number back to String format like so: 然后将该数字反转回String格式,如下所示:

String s = Long.toHexString( featureID );
//Place 0's at the start of the String making a Strnig of size 16
s = StringUtil.padString( s, 16, '0', true ); 
StringBuffer sb = new StringBuffer( s );
sb.insert( 8, '.' );
sb.insert( 13, '.' );

return sb.toString();

Cheers for all the responses. 为所有回应干杯。

This sounds ... icky, but I don't know your context so trying not to pass judgement. 这听起来...... icky,但我不知道你的背景所以试图不通过判断。 8) 8)

Have you considered converting the characters in the string into their ASCII equivalents? 您是否考虑过将字符串中的字符转换为ASCII等效字符?

ADDENDUM: Of course required truncating out semi-superflous characters to fit, which sounds like an option you may have from comments. ADDENDUM:当然需要截断半超级字符以适应,这听起来像是评论中的一个选项。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM