简体   繁体   中英

What is a faster way to iterate through a numpy array in Python

I am wondering if there is a better way to iterate through numpy arrays? I have timed my nested iterations and it takes roughly about 40-50 seconds per loop, and i am wondering if there is a faster way to do it? I know that looping through numpy arrays is not ideal, however I'm out of ideas. I looked through many questions on Stack Overflow but all of them ends up confusing me even more.

I have tried converting the numpy array to a list using the tolist() function, however the run time is equally slower, if not worse.

def euc_distance(array1, array2):
    return np.power(np.sum((array1 - array2)**2) , 0.5)

for i in range(N):
    for j,n in enumerate(data2.values): 
        distance = euc_distance(n, D[i]) 
        if distance < Dradius[i] and NormAttListTest[j] == "Attack":
            TP += 1

My euc_distance function passes in an array form (In my case, 5 dimensional) inputs, to output a 1 dimensional value. My data2.values is my way of access the numpy array through the pandas framework which is a [500 000, 5] dataframe.

(Note that the NormAttListTest is a list that has the categorical data of "Attack" and "Normal" tagged to each individual testing data).

Your problem is that you use numpy in a wrong way because numpy is all about vectorized computations like MATLAB . Consider the following modification of your code. I replaced your loop over numpy array with plain numpy code that efficiently utilizes vectorization for 2d arrays. As a result code runs 100 times faster.

import functools
import numpy as np
import time

# decorator to measure running time
def measure_running_time(echo=True):
    def decorator(func):
        @functools.wraps(func)
        def wrapped(*args, **kwargs):
            t_1 = time.time()
            ans = func(*args, **kwargs)
            t_2 = time.time()
            if echo:
                print(f'{func.__name__}() running time is {t_2 - t_1:.2f} s')
            return ans
        return wrapped
    return decorator


def euc_distance(array1, array2):
    return np.power(np.sum((array1 - array2) ** 2), 0.5)

# original function
@measure_running_time()
def calculate_TP_1(N, data2, D, Dradius, NormAttListTest, TP=0):
    for i in range(N):
        for j, n in enumerate(data2):
            distance = euc_distance(n, D[i])
            if distance < Dradius[i] and NormAttListTest[j] == "Attack":
                TP += 1
    return TP

# new version
@measure_running_time()
def calculate_TP_2(N, data2, D, Dradius, NormAttListTest, TP=0):

    # this condition is the same for every i value
    NormAttListTest = np.array([val == 'Attack' for val in NormAttListTest])

    for i in range(N):

        # don't use loop over numpy arrays

        # compute distance for all the rows
        distance = np.sum((data2 - D[i]) ** 2, axis=1) ** .5
        # check conditions for all the row
        TP += np.sum((distance < Dradius[i]) & (NormAttListTest))

    return TP


if __name__ == '__main__':

    N = 10
    NN = 100_000
    D = np.random.randint(0, 10, (N, 5))
    Dradius = np.random.randint(0, 10, (N,))
    NormAttListTest = ['Attack'] * NN
    NormAttListTest[:NN // 2] = ['Defence'] * (NN // 2)
    data2 = np.random.randint(0, 10, (NN, 5))

    print(calculate_TP_1(N, data2, D, Dradius, NormAttListTest))

    print(calculate_TP_2(N, data2, D, Dradius, NormAttListTest))

Output:

calculate_TP_1() running time is 7.24 s
96476
calculate_TP_2() running time is 0.06 s
96476

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM