简体   繁体   中英

How to convert an 18 Character String into a Unique ID?

I have an 18 Character String that I need to convert into a unique long (in Java). A sample String would be: AAA2aNAAAAAAADnAAA

My String is actually an Oracle ROWID, so it can be broken down if needs be, see: http://download-uk.oracle.com/docs/cd/B19306_01/server.102/b14220/datatype.htm#CNCPT713

The long number generated, (1) Must be unique, as no two results can point to the same database row and (2) Must be reversible, so I can get the ROWID String back from the long?

Any suggestions on an algorithm to use would be welcome.

Oracle forum question on this from a few years ago : http://forums.oracle.com/forums/thread.jspa?messageID=1059740

Ro

You can't, with those requirements.

18 characters of (assuming) upper and lower case letters has 56 18 or about 2.93348915 × 103 31 combinations. This is (way) more than the approximate 1.84467441 × 10 19 combinations available among 64 bits.

UPDATE: I had the combinatorics wrong, heh. Same result though.

Just create a map (dictionary / hashtable) that maps ROWID strings to an (incremented) long. If you keep two such dictionaries and wrap them up in a nice class, you will have a bidirectional lookup between the strings and the long IDs.

Pseudocode:

class BidirectionalLookup:
    dict<string, long> stringToLong
    dict<long, string> longToString
    long lastId

    addString(string): long
        newId = atomic(++lastId)
        stringToLong[string] = newId
        longToString[newId] = string
        return newId

    lookUp(string): long
        return stringToLong[string]

    lookUp(long): string
        return longToString[long]

Your String of 18 characters representing a base 64 encoding represents a total of 108 bits of information, which is almost twice that of long's 64. We have a bit of a problem here if we want to represent every possible key and have the representation be reversible.

The string can be broken down into 4 numbers easily enough. Each of those 4 numbers represents something - a block number, an offset in that block, whatever. If you manage to establish upper limits on the underlying quantities such that you know larger numbers will not occur (ie if you find a way to identify at least 44 of those bits that will always be 0), then you can map the rest onto a long, reversibly.

Another possibility would be to relax the requirement that the equivalent be a long . How about a BigInteger ? That would make it easy.

I'm assuming that's a case-insensitive alpha-numeric string, and so drawn from the set [a-zA-Z0-9]*

In that case you have

26 + 26 + 10 = 62 

possible values for each character.

62 < 64 = 2^6

In other words you need (at least) 6 bits to store each of the 18 characters of the key.

6 * 18 = 108 bits 

to store the entire string uniquely.

108 bits  = (108 / 8) = 13.5 bytes.

Therefore as long as your data type can store at least 13.5 bytes then you can fairly simply define a mapping:

  1. Map from raw ASCII for each character to a representation using only 6 bits
  2. Concatenate all 18 reduced representations to a sinlde 14 byte value
  3. Cast this to your final data value

Obviously Java has nothing more than an 8 byte long . So if you have to use a long then it is NOT possible to uniquely map the strings, unless there is something else which reduces the space of valid input strings.

Theoretically, you can't represent ROWID in a long (8 bytes). However, depending on the size of your databases (the whole server, not only your table), you might be able to encode it into a long.

Here is the layout of ROWID,

   OOOOOO-FFF-BBBBBB-RRR

Where O is ObjectID. F is FileNo. B is Block and R is Row Number. All of them are Base64-encoded. As you can see O & B can have 36-bits and B&R can have 18.

If your database is not huge, you can use 2 byte for each part. Basically, your ObjectId and block number will be limited to 64K. Our DBA believes our database has to be several magnitude bigger for us to get close to these limits.

I would suggest you find max of each part in your database and see if you are close. I wouldn't use long if they are anywhere near the limit.

Found a way to extract the ROWID in a different manner from the database....

SQL> select DBMS_ ROWID.ROWID_ TO_RESTRICTED( ROWID, 1 ) FROM MYTABLE;

0000EDF4.0001.0000 0000EDF4.0002.0000 0000EDF4.0004.0000 0000EDF4.0005.0000 0000EDF4.0007.0000 0000EDF5.0000.0000 0000EDF5.0002.0000 0000EDF5.0003.0000

Then convert it to a number like so :

final String hexNum = rowid.replaceAll( "\.", "" );
final long lowerValue = Long.parseLong( hexNum.substring( 1 ), 16 );
long upperNibble = Integer.parseInt( hexNum.substring( 0, 1 ), 16 );
if ( upperNibble >= 8 ) {
  //Catch Case where ROWID > 8F000000.0000.0000
  upperNibble -= 8;
  return -( 9223372036854775807L - ( lowerValue - 1 + ( upperNibble << 60 ) ) );
} else {
  return ( lowerValue + ( upperNibble << 60 ) );
}

Then reverse that number back to String format like so:

String s = Long.toHexString( featureID );
//Place 0's at the start of the String making a Strnig of size 16
s = StringUtil.padString( s, 16, '0', true ); 
StringBuffer sb = new StringBuffer( s );
sb.insert( 8, '.' );
sb.insert( 13, '.' );

return sb.toString();

Cheers for all the responses.

This sounds ... icky, but I don't know your context so trying not to pass judgement. 8)

Have you considered converting the characters in the string into their ASCII equivalents?

ADDENDUM: Of course required truncating out semi-superflous characters to fit, which sounds like an option you may have from comments.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM