Reducing time-complexity of python (2.7) algorithm

Question

I have in input a list consisting of three lists each representing the X, Y and Z coordinates respectively. For example:

coords = [[2, 1, 5, 2, 8, 6, 8, 6, 1, 2, 3 , 4], [1, 3, 4, 1, 2, 2, 2, 4, 2, 3, 4, 5], [2, 4, 7, 2, 1, 2, 1, 4, 5, 6, 9, 8]]

where the list of coordinates X is: X = [2, 1, 5, 2, 8, 6, 8, 6, 1, 2, 3, 4]

A point will be formed like this: point = [2, 1, 2] . A point XYZ represents a vertex of a cube. (In my program I have to analyze a set of stacked or side-by-side cubes).

As the output of the function, I would like a list of ID as large as the number of total points (= length of one of the coordinate lists). For different points the ID must be unique and incremented sequentially as the list of points is iterated. When a point had already been met (for example when a vertex of a cube coincides with the vertex of another cube), in the output list the point must have the ID that has the same point encountered first.

The result of the example should be outp = [1, 2, 3, 1, 5, 6, 5, 8, 9, 10, 11]

This is the code I wrote and it works perfectly:

def AssignIDtoNode(coords):

    outp = []
    n_points = len(coords[0])
    points = []

    memo_set = set()
    memo_lst = ["" for x in xrange(0, n_points)]

    for i in range(n_points):

        point = "(X = " + str(coords[0][i]) + ", Y = " + str(coords[1][i]) + ", Z = " + str(coords[2][i]) + ")"
        if punto not in memo_set:
            outp.append(i+1)
            memo_set.add(point)
            memo_lst[i] = point
        else:
            ind = memo_lst.index(point)
            outp.append(ind+1)
                
    return outp

The problem arises when in input to the function there is a very large list of points (millions of points) and the calculation time increases considerably. I have transformed each point into a string to facilitate searches and I used a set where possible to reduce the first search time. In my opinion, the program takes a long time when it has to search for the index of a point through the.index () function.

Is there any way to further optimize this function?

Answer 1

Use enumerate, zip and a dictionary to store the indices - {(x,y,z):index,...}

def f(coords):

    d = {}
    outp = []
    for i,punto in enumerate(zip(*coords),1):
        d[punto] = d.get(punto,i)    # if it doesn't exist add it with the current index
        outp.append(d[punto])
                
    return outp

Single pass through the points, no type conversions, constant time lookups.

>>> AssignIDtoNode(coords) == f(coords)
True

zip and enumerate docs

LBYL...

def g(coords):
    outp = []
    d = {}
    for i,punto in enumerate(zip(*coords),1):
        if punto not in d:
            d[punto] = i
        outp.append(d[punto])        
    return outp

g is about 25% faster than f for 1 million and 3 million (x,y,z) points.

Answer 2

use dict that mapping from point to index

def AssignIDtoNode(coords):

outp = []
n_points = len(coords[0])
points = []

memo_dict = dict()

for i in range(n_points):

    point = tuple(coords[0][i],coords[1][i],coords[2][i])
    if point not in memo_dict:
        outp.append(i+1)
        memo_dict[point] = i+1
    else:
        ind = memo_dict[point]
        outp.append(ind+1)
            
return outp

Answer 3

You should be able to do it in linear time with a single pass using a list comprehension with an internal dictionary:

coords = [[2, 1, 5, 2, 8, 6, 8, 6, 1, 2, 3, 4], 
          [1, 3, 4, 1, 2, 2, 2, 4, 2, 3, 4, 5], 
          [2, 4, 7, 2, 1, 2, 1, 4, 5, 6, 9, 8]]

IDs = [d[c] for d in [dict()] for c in zip(*coords) if d.setdefault(c,len(d)+1)]

print(IDs)
# [1, 2, 3, 1, 4, 5, 4, 6, 7, 8, 9, 10]

Reducing time-complexity of python (2.7) algorithm

Question

3 answers

solution1
1 2020-12-30 17:51:12

solution2
0 2020-12-30 17:18:33

solution3
0 2021-01-02 22:21:49

Reducing time-complexity of python (2.7) algorithm

Question

3 answers

solution1 1 2020-12-30 17:51:12

solution2 0 2020-12-30 17:18:33

solution3 0 2021-01-02 22:21:49

solution1
1 2020-12-30 17:51:12

solution2
0 2020-12-30 17:18:33

solution3
0 2021-01-02 22:21:49