简体   繁体   中英

efficiently compute ordering permutations in numpy array

I've got a numpy array. What is the fastest way to compute all the permutations of orderings.

What I mean is, given the first element in my array, I want a list of all the elements that sequentially follow it. Then given the second element, a list of all the elements that follow it.

So given my list: b, c, & d follow a. c & d follow b, and d follows c.

x = np.array(["a", "b", "c", "d"])

So a potential output looks like:

[
    ["a","b"],
    ["a","c"],
    ["a","d"],

    ["b","c"],
    ["b","d"],

    ["c","d"],
]

I will need to do this several million times so I am looking for an efficient solution.

I tried something like:

im = np.vstack([x]*len(x))
a = np.vstack(([im], [im.T])).T
results = a[np.triu_indices(len(x),1)]

but its actually slower than looping...

You can use itertools 's functions like chain.from_iterable and combinations with np.fromiter for this. This involves no loop in Python, but still not a pure NumPy solution:

>>> from itertools import combinations, chain
>>> arr = np.fromiter(chain.from_iterable(combinations(x, 2)), dtype=x.dtype)
>>> arr.reshape(arr.size/2, 2)
array([['a', 'b'],
       ['a', 'c'],
       ['a', 'd'],
       ..., 
       ['b', 'c'],
       ['b', 'd'],
       ['c', 'd']], 
      dtype='|S1')

Timing comparisons:

>>> x = np.array(["a", "b", "c", "d"]*100)
>>> %%timeit
    im = np.vstack([x]*len(x))
    a = np.vstack(([im], [im.T])).T
    results = a[np.triu_indices(len(x),1)]
... 
10 loops, best of 3: 29.2 ms per loop
>>> %%timeit
    arr = np.fromiter(chain.from_iterable(combinations(x, 2)), dtype=x.dtype)
    arr.reshape(arr.size/2, 2)
... 
100 loops, best of 3: 6.63 ms per loop

I've been browsing the source and it seems the tri functions have had some very substantial improvements relatively recently. The file is all Python so you can just copy it into your directory if that helps.

I seem to have completely different timings to Ashwini Chaudhary, even after taking this into account.

It is very important to know the size of the arrays you want to do this on; if it is small you should cache intermediates like triu_indices .

The fastest code for me was:

def triangalize_1(x):
    xs, ys = numpy.triu_indices(len(x), 1)
    return numpy.array([x[xs], x[ys]]).T

unless the x array is small.

If x is small, caching was best:

triu_cache = {}
def triangalize_1(x):
    if len(x) in triu_cache:
        xs, ys = triu_cache[len(x)]

    else:
        xs, ys = numpy.triu_indices(len(x), 1)
        triu_cache[len(x)] = xs, ys

    return numpy.array([x[xs], x[ys]]).T

I wouldn't do this for large x because of memory requirements.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM