Efficient way of re-numbering elements in an array

Question

I am reasonably new to python and am trying to implement a g.netic algorithm, but need some assistance with the code for one of the operations.

I have formulated the problem this way:

each individual I is represented by a string of M integers
each element e in I takes a value from 0 to N
every number from 0 - N must appear in I at least once
the value of e is not important, so long as each uniquely valued element takes the same unique value (think of them as class labels)
e is less than or equal to N
N can be different for each I

after applying the crossover operation i can potentially generate children which violate one or more of these constraints, so i need to find a way to re-number the elements so that they retain their properties, but fit with the constraints.

for example:

parent_1 (N=5): [1 3 5 4 2 1|0 0 5 2]
parent_2 (N=3): [2 0 1 3 0 1|0 2 1 3]

*** crossover applied at "|" ***

child_1: [1 3 5 4 2 1 0 2 1 3]
child_2: [2 0 1 3 0 1 0 0 5 2]

child_1 obviously still satisfies all of the constraints, as N = 5 and all values 0-5 appear at least once in the array.

The problem lies with child 2 - if we use the max(child_2) way of calculating N we get a value of 5, but if we count the number of unique values then N = 4, which is what the value for N should be. What I am asking (in a very long winded way, granted) is what is a good, pythonic way of doing this:

child_2: [2 0 1 3 0 1 0 0 5 2]
*** some python magic ***
child_2':  [2 0 1 3 0 1 0 0 4 2]
*or*
child_2'': [0 1 2 3 1 2 1 1 4 0]

child_2'' is there to illustrate that the values themselves dont matter, so long as each element of a unique value maps to the same value, the constraints are satisfied.

here is what i have tried so far:

value_map = []
for el in child:
    if el not in value_map:
        value_map.append(el)

for ii in range(0,len(child)):
    child[ii] = value_map.index(child[ii])

this approach works and returns a result similar to child_2'' , but i can't imagine that it is very efficient in the way it iterates over the string twice, so i was wondering if anyone has any suggestions of how to make it better.

thanks, and sorry for such a long post for such a simple question!

Answer 1

You will need to iterates the list more than once, I don't think there's any way around this. After all, you first have to determine the number of different elements (first pass) before you can start changing elements (second pass). Note, however, that depending on the number of different elements you might have up to O(n^2) due to the repetitive calls to index and not in , which have O(n) on a list.

Alternatively, you could use a dict instead of a list for your value_map . A dictionary has much faster lookup than a list, so this way, the complexity should indeed be on the order of O(n). You can do this using (1) a dictionary comprehension to determine the mapping of old to new values, and (2) a list comprehension for creating the updated child.

value_map = {el: i for i, el in enumerate(set(child))}
child2 = [value_map[el] for el in child]

Or change the child in-place using a for loop.

for i, el in enumerate(child):
    child[i] = value_map[el]

Answer 2

You can do it with a single loop like this:

value_map = []
result = []
for el in child:
    if el not in value_map:
        value_map.append(el)
    result.append(value_map.index(el))

Answer 3

One solution I can think of is:

Determine the value of N and determine unused integers. (this forces you to iterate over the array once)
Go through the array and each time you meet a number superior to N, map it to an unused integer.

This forces you to go through the arrays twice, but it should be faster than your example (that forces you to go through the value_map at each element of the array at each iteration)

child = [2, 0, 1, 3, 0, 1, 0, 0, 5, 2]

used = set(child)
N = len(used) - 1
unused = set(xrange(N+1)) - used

value_map = dict()
for i, e in enumerate(child):
    if e <= N:
        continue
    if e not in value_map:
        value_map[e] = unused.pop()
    child[i] = value_map[e]
print child # [2, 0, 1, 3, 0, 1, 0, 0, 4, 2]

Answer 4

I like @Selçuk Cihan answer. It can also be done in place.

>>> child = [2, 0, 1, 3, 0, 1, 0, 0, 5, 2]
>>>
>>> value_map = []
>>> for i in range(len(child)):
...     el = child[i]
...     if el not in value_map:
...         value_map.append(el)
...     child[i] = value_map.index(el)
...
>>> child
[0, 1, 2, 3, 1, 2, 1, 1, 4, 0]

Answer 5

I believe that this works, although I didn't test it for more than the single case that is given in the question.

The only thing that bothers me is that value_map appears three times in the code...

def renumber(individual):
    """
    >>> renumber([2, 0, 1, 3, 0, 1, 0, 0, 4, 2])
    [0, 1, 2, 3, 1, 2, 1, 1, 4, 0]
    """
    value_map = {}
    return [value_map.setdefault(e, len(value_map)) for e in individual]

Answer 6

Here is a fast solution, which iterates the list only once.

a = [2, 0, 1, 3, 0, 1, 0, 0, 5, 2]
b = [-1]*len(a)
j = 0
for i in range(len(a)):
    if b[a[i]] == -1:
        b[a[i]] = j
        a[i] = j
        j += 1
    else:
        a[i] = b[a[i]]

print(a) # [0, 1, 2, 3, 1, 2, 1, 1, 4, 0]

Efficient way of re-numbering elements in an array

Question

6 answers

solution1
2 ACCPTED 2015-04-21 11:39:55

solution2
1 2015-04-21 11:40:27

solution3
1 2015-04-21 11:42:56

solution4
0 2015-04-21 11:47:31

solution5
0 2015-04-21 11:57:46

solution6
0 2021-08-17 08:30:25

Efficient way of re-numbering elements in an array

Question

6 answers

solution1 2 ACCPTED 2015-04-21 11:39:55

solution2 1 2015-04-21 11:40:27

solution3 1 2015-04-21 11:42:56

solution4 0 2015-04-21 11:47:31

solution5 0 2015-04-21 11:57:46

solution6 0 2021-08-17 08:30:25

solution1
2 ACCPTED 2015-04-21 11:39:55

solution2
1 2015-04-21 11:40:27

solution3
1 2015-04-21 11:42:56

solution4
0 2015-04-21 11:47:31

solution5
0 2015-04-21 11:57:46

solution6
0 2021-08-17 08:30:25