简体   繁体   中英

Speed up python looping with numpy

I've tried as much as possible to find a suitable solution/answer on the forum but I'm pulling blanks - most probably due to incorrect terminology.

I currently have the following block of code that performs very poorly when performing the below operations on the array areas[] - the size of the array is upto 150,000 elements.

areas = [x .. x1]
prices = [y .. y1]
for key, value in enumerate(areas):
    compListPrices = [prices[ind] for ind, x in enumerate(areas) if x == value and ind != key]
    # Further operations using compListPrices here

From my understanding, the execution of this code could be improved considerably by leveraging numpy, but I'm struggling to convert it.

Your implementation complexity is O(n²)

I would recommend using zip: O(n) then sort: O(n*log(n)) and a simple loop O(n) resulting in O(n*log(n)) complexity

Even though I'm not using numpy, a faster algorithm is usually faster than just move the logic to numpy

Here is an working example: https://abstra.show/yMFZqiUt8D

You can do what you want in O(n). Like below:

>>> areas  = [1,2,3,4,1,1,3,2]
>>> prices = [100,10,20,50,30,40,20,20]

# zipping areas with thier indexs
>>> lst = list(zip(areas , range(len(areas))))
>>> lst
[(1, 0), (2, 1), (3, 2), (4, 3), (1, 4), (1, 5), (3, 6), (2, 7)]

# save each values exist in each indexs
>>> dct = {}
>>> for l in lst:
....    dct.setdefault(l[0], []).append(l[1])
>>> dct
{1: [0, 4, 5], 2: [1, 7], 3: [2, 6], 4: [3]}

# find prices from their indexs
>>> res = [[] for _ in range(len(areas))]
>>> for k,v in dct.items():
....    for i in v:
....        res[i] = [prices[j] for j in v if i!=j]
>>> res
[[30, 40], [20], [20], [], [100, 40], [100, 30], [20], [10]]

Benchmark on Colab :

from numpy.random import randint
areas = randint(0, 10, 10_000)
prices = randint(0, 1000, 10_000)

def method_Npow2(areas, prices):
    res = []
    for key, value in enumerate(areas):
        res.append([prices[ind] for ind, x in enumerate(areas) if x == value and ind != key])
    return res

def method_Npow1(areas, prices):
    lst = list(zip(areas , range(len(areas))))
    dct = {}
    for l in lst:
        dct.setdefault(l[0], []).append(l[1])

    res = [[] for _ in range(len(areas))]
    for k,v in dct.items():
        for i in v:
            res[i] = [prices[j] for j in v if i!=j]    
    return res

Output:

%timeit method_Npow2(areas, prices)
# 1 loop, best of 5: 13.9 s per loop

%timeit method_Npow1(areas, prices)
# 1 loop, best of 5: 1.76 s per loop

m1 = method_Npow2(areas, prices)
m2 = method_Npow1(areas, prices)
np.all(np.asarray(m1) == np.asarray(m2))
# True

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM