简体   繁体   中英

Optimize Double For Loop Using NumPy

I have a python function with a nested for loop that is called thousands of times, and is too slow. From what I have read online, there should be a way to optimize it with numpy vectorization so that the iteration is done in much faster C code rather than python. But, I have never worked with numpy before and I can't figure it out.

The function is below. The first parameter is a 2-dimensional array (list of lists). The second parameter is a list of rows of the 2D array to check. The third parameter is a list of columns of the 2D array to check (Note that the number of rows is not equal to the number of cols). The fourth parameter is a value with which to compare elements of the 2D array. I am trying to return a list that for each column contains a list has all row indices that correspond to elements equal to val.

def filter_indices(my_2d_arr, rows, cols, val):
    result_indices = []

    for c in cols:
        col_indices = []

        for idx in rows:
            if my_2d_arr[idx][c] == val:
                col_indices.append(idx)
        result_indices.append(col_indices)

    return result_indices

Like I said, this is way too slow and I am rather confused about how I could vectorize this is numpy. Any pointers/guidance would be great.

EDIT

@BM Thanks for your answer. I ran your solution myself separately from the rest of my code and compared it with my previous function without numpy. Like you said, it worked much faster with numpy that my original function did. However, when running it as part of my code, my solution is actually slower for some reason. I did have to add a little to your function and modify some of my existing code to make them compatible, but I am being thrown off in that timeit is saying that the numpy version is faster while cProfile is showing that my original filter_indices function is faster than the new numpy one. I have no idea how the numpy filter_indices could take so much longer, considering that it was faster when run separately from the rest of my code.

Here's my original filter_indices without numpy:

def filter_indices_orig(a, data_indices, feature_set, val):
    result_indices = []

    for feature_no in feature_set:
        feature_indices = []

        for idx in data_indices:
            if a[idx][feature_no] == val:
                feature_indices.append(idx)
        result_indices.append(feature_indices)

    return result_indices

Here's my slightly modified filter_indices with numpy:

def filter_indices(a, data_indices, feature_set, val):
    result_indices = {}
    sub = a[np.meshgrid(data_indices, feature_set, indexing='ij')]
    r, c = (sub == val).nonzero()
    rs = np.take(data_indices, r)
    cs = np.take(feature_set, c)
    coords = zip(rs, cs)

    for r, c in coords:
        feat_indices = result_indices.get(c, [])
        feat_indices.append(r)
        result_indices[c] = feat_indices

    return result_indices

EDIT 2

I figured out that the numpy solution is slower when I am only searching for a few columns, but it is faster when I am looking for a large number of columns. Unfortunately, even specifically using my original non-numpy solution when there are a few columns being searched while using the numpy solution when there are large numbers of columns being searched still is slower than my original solution, which I do not understand whatsoever.

Here a function which return 2 arrays, rows indices and cols indices of the pixels where value is val in the selected subarray :

def filter_indices_numpy(a,rows,cols,val):
        sub=a[meshgrid(rows,cols,indexing='ij')]
        r,c = (sub==val).nonzero()
        return take(rows,r),take(cols,c)

Example:

a=randint(0,3,(5,5))   

#array([[0, 1, 0, 2, 2],
#       [0, 0, 2, 0, 0],
#       [2, 1, 1, 0, 0],
#       [1, 0, 0, 1, 2],
#       [2, 1, 0, 0, 0]])

filter_indices_numpy(a,[1,2,3],[1,2,3],0)
#(array([1, 1, 2, 3, 3]), array([1, 3, 3, 1, 2]))

Some explanations :

meshgrid(rows,cols,indexing='ij') are the indices of selected rows and cols. sub is the sub-array. r,c = (sub==val).nonzero() are the indices where value is val in the sub-array. take(rows,r),take(cols,c) translate the indices in the array a .

Test for : a=randint(0,200,(1000,1000));rows=cols=arange(100)

In [4]: %timeit filter_indices(a,rows,cols,0)
10 loops, best of 3: 23.1 ms per loop

In [5]: %timeit filter_indices_numpy(a,rows,cols,0)
1000 loops, best of 3: 933 µs per loop

it's about 25X faster.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM