Speed up python looping with numpy

Question

I've tried as much as possible to find a suitable solution/answer on the forum but I'm pulling blanks - most probably due to incorrect terminology.

I currently have the following block of code that performs very poorly when performing the below operations on the array areas[] - the size of the array is upto 150,000 elements.

areas = [x .. x1]
prices = [y .. y1]
for key, value in enumerate(areas):
    compListPrices = [prices[ind] for ind, x in enumerate(areas) if x == value and ind != key]
    # Further operations using compListPrices here

From my understanding, the execution of this code could be improved considerably by leveraging numpy, but I'm struggling to convert it.

Answer 1

Your implementation complexity is O(n²)

I would recommend using zip: O(n) then sort: O(n*log(n)) and a simple loop O(n) resulting in O(n*log(n)) complexity

Even though I'm not using numpy, a faster algorithm is usually faster than just move the logic to numpy

Here is an working example: https://abstra.show/yMFZqiUt8D

Answer 2

You can do what you want in O(n). Like below:

>>> areas  = [1,2,3,4,1,1,3,2]
>>> prices = [100,10,20,50,30,40,20,20]

# zipping areas with thier indexs
>>> lst = list(zip(areas , range(len(areas))))
>>> lst
[(1, 0), (2, 1), (3, 2), (4, 3), (1, 4), (1, 5), (3, 6), (2, 7)]

# save each values exist in each indexs
>>> dct = {}
>>> for l in lst:
....    dct.setdefault(l[0], []).append(l[1])
>>> dct
{1: [0, 4, 5], 2: [1, 7], 3: [2, 6], 4: [3]}

# find prices from their indexs
>>> res = [[] for _ in range(len(areas))]
>>> for k,v in dct.items():
....    for i in v:
....        res[i] = [prices[j] for j in v if i!=j]
>>> res
[[30, 40], [20], [20], [], [100, 40], [100, 30], [20], [10]]

Benchmark on Colab :

from numpy.random import randint
areas = randint(0, 10, 10_000)
prices = randint(0, 1000, 10_000)

def method_Npow2(areas, prices):
    res = []
    for key, value in enumerate(areas):
        res.append([prices[ind] for ind, x in enumerate(areas) if x == value and ind != key])
    return res

def method_Npow1(areas, prices):
    lst = list(zip(areas , range(len(areas))))
    dct = {}
    for l in lst:
        dct.setdefault(l[0], []).append(l[1])

    res = [[] for _ in range(len(areas))]
    for k,v in dct.items():
        for i in v:
            res[i] = [prices[j] for j in v if i!=j]    
    return res

Output:

%timeit method_Npow2(areas, prices)
# 1 loop, best of 5: 13.9 s per loop

%timeit method_Npow1(areas, prices)
# 1 loop, best of 5: 1.76 s per loop

m1 = method_Npow2(areas, prices)
m2 = method_Npow1(areas, prices)
np.all(np.asarray(m1) == np.asarray(m2))
# True

Speed up python looping with numpy

Question

2 answers

solution1
0 2022-06-05 23:24:23

solution2
0 2022-06-06 13:32:51

Speed up python looping with numpy

Question

2 answers

solution1 0 2022-06-05 23:24:23

solution2 0 2022-06-06 13:32:51

solution1
0 2022-06-05 23:24:23

solution2
0 2022-06-06 13:32:51