简体   繁体   中英

Generating unique fixed integer ids from array of ids

So here is the situation... I got array of objects, each marked with unique integer id, and for each and every combination of those objects, I need to create new ones, each with unique ids. Problem is that that list of objects is dynamic, used in stateless environment, so newly generated ids must be same for every run .

To make it clearer what I need here, consider that array of objects as array of their ids, for example: [10, 7, 23]. And basically, I need to get ids for all the possible combinations:
10, 7
10, 23
7, 23
10, 7, 23

What's important here is that generated ids must be same for each distinct combination (for example: 10 and 7 should always produce same id). Also, newly added objects should not affect previously generated ids. So for example, when some new object is later on added to that list, ids generated from previous combinations must remain the same as before new object was added.

Currently, I have a solution that pretty much comes down to generating new id as a result of the sum of combining ids, so resulting ids are:
17
33
30
40

Of course, this approach can produce duplicate ids, and that's the reason I'm asking for advice for some more sophisticated algorithm. I also tried introducing fixed offset of 1000 for newly generated ids and multiplying sum with number of objects in combination, so that for example resulting ids are 1034 (1000+(10+7)*2), 1066 (1000+(10+23)*2), etc., but I'm not sure that it would save me from duplicates. :)

Clear mention, I need this for the purpose of certain PHP project, but as this problem is not language-specific, I hope that there are some good mathematicians that can bring some good solution. :)

Useful information is fact that combining ids are in range from 10000-99999 and maximum number of items in combination does not exceed 10.

Please note that I do not need solution for how to make all the combinations from array elements, but only that "formula" for producing integer id.

Thanks in advance.

Not really sure what your aim is, but I'll have a go...

Have you tried using character keys? For example 10, 7, 3 becomes a sequence with an underscore. Each sequence will have a unique hash.

$arrayOfKeys = array(10, 7, 3);
$hash = implode('_', $arrayOfKeys);
print $hash;

# 10_7_3

Personally I'd go for this simple approach. If you're using a database and you're not producing, say, 100k records per day, it should be pretty fast using an indexed (primary key or unique) varchar field.

If you are to create numbers, here a tip: take the length of the largest number and that will be the prefix of your sequence, eg:

10, 5, 1 -> 2100501
105, 45, 201 -> 3105045201

The prefix will tell you what the length of the following sequences are. I can't think of any way you'd get doubles... Anyone? ;)

Hope it helps...

Step 1: Sort the values you get. eg: if you get 10, 7 or 7, 10 it should result result in 7, 10 before going to the ID generator. If you know the range of your numbers ie lets assume [0-100] use radix or count sort, will be fast.

Step 2 : Represent the numbers as strings, seperated by any chosen seperator.(':') maybe. eg: for 7, 10 id will become "7:10".

Sorting is being done to avoid generating different ID's for 10, 7 and 7, 10.

BTW What do these numbers represent?

I don't think this is possible unless you allow labels of increasing length.

Assume you have a maximum of N distinct objects, corresponding to N distinct labels.

If you want to be able to represent all possible pairs, assuming order in a pair does not matter, you potentially need N.(N-1)/2 extra labels, whatever they are, and you need to reserve them all.

And for all triples, N.(N-1).(N-2)/6 , for all quads N.(N-1).(N-2).(N-3)/24 ...

This grows exponentially and will very quickly exceed the capacity of integers.

Any other solution that tries to compress the space of labels, such as hashing, will result in collisions. You can resolve the collisions by maintaining collision table, but this will break the "generated ids must be same for every run" requirement.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM