简体   繁体   English

有效地计算numpy数组中的排序排列

[英]efficiently compute ordering permutations in numpy array

I've got a numpy array. 我有一个numpy数组。 What is the fastest way to compute all the permutations of orderings. 计算所有排序排列的最快方法是什么。

What I mean is, given the first element in my array, I want a list of all the elements that sequentially follow it. 我的意思是,鉴于我的数组中的第一个元素,我想要一个按顺序跟随它的所有元素的列表。 Then given the second element, a list of all the elements that follow it. 然后给出第二个元素,列出其后面的所有元素。

So given my list: b, c, & d follow a. 所以给我的清单:b,c,&d跟随a。 c & d follow b, and d follows c. c&d跟随b,d跟随c。

x = np.array(["a", "b", "c", "d"])

So a potential output looks like: 所以潜在的输出看起来像:

[
    ["a","b"],
    ["a","c"],
    ["a","d"],

    ["b","c"],
    ["b","d"],

    ["c","d"],
]

I will need to do this several million times so I am looking for an efficient solution. 我需要做几百万次,所以我正在寻找一个有效的解决方案。

I tried something like: 我尝试过类似的东西:

im = np.vstack([x]*len(x))
a = np.vstack(([im], [im.T])).T
results = a[np.triu_indices(len(x),1)]

but its actually slower than looping... 但它实际上比循环慢......

You can use itertools 's functions like chain.from_iterable and combinations with np.fromiter for this. 您可以使用itertools的函数,例如chain.from_iterable以及与np.fromiter combinations This involves no loop in Python, but still not a pure NumPy solution: 这不涉及Python中的循环,但仍然不是纯粹的NumPy解决方案:

>>> from itertools import combinations, chain
>>> arr = np.fromiter(chain.from_iterable(combinations(x, 2)), dtype=x.dtype)
>>> arr.reshape(arr.size/2, 2)
array([['a', 'b'],
       ['a', 'c'],
       ['a', 'd'],
       ..., 
       ['b', 'c'],
       ['b', 'd'],
       ['c', 'd']], 
      dtype='|S1')

Timing comparisons: 时间比较:

>>> x = np.array(["a", "b", "c", "d"]*100)
>>> %%timeit
    im = np.vstack([x]*len(x))
    a = np.vstack(([im], [im.T])).T
    results = a[np.triu_indices(len(x),1)]
... 
10 loops, best of 3: 29.2 ms per loop
>>> %%timeit
    arr = np.fromiter(chain.from_iterable(combinations(x, 2)), dtype=x.dtype)
    arr.reshape(arr.size/2, 2)
... 
100 loops, best of 3: 6.63 ms per loop

I've been browsing the source and it seems the tri functions have had some very substantial improvements relatively recently. 我一直在浏览源代码,看起来这tri函数最近都有了一些非常重要的改进。 The file is all Python so you can just copy it into your directory if that helps. 该文件都是Python,因此如果有帮助,您可以将其复制到您的目录中。

I seem to have completely different timings to Ashwini Chaudhary, even after taking this into account. 考虑到这一点,我似乎对Ashwini Chaudhary的时间完全不同。

It is very important to know the size of the arrays you want to do this on; 了解要执行此操作的阵列的大小非常重要; if it is small you should cache intermediates like triu_indices . 如果它很小,你应该缓存像triu_indices这样的triu_indices

The fastest code for me was: 对我来说最快的代码是:

def triangalize_1(x):
    xs, ys = numpy.triu_indices(len(x), 1)
    return numpy.array([x[xs], x[ys]]).T

unless the x array is small. 除非x数组很小。

If x is small, caching was best: 如果x很小,缓存最好:

triu_cache = {}
def triangalize_1(x):
    if len(x) in triu_cache:
        xs, ys = triu_cache[len(x)]

    else:
        xs, ys = numpy.triu_indices(len(x), 1)
        triu_cache[len(x)] = xs, ys

    return numpy.array([x[xs], x[ys]]).T

I wouldn't do this for large x because of memory requirements. 由于内存需求,我不会为大x做这个。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM