[英]efficiently compute ordering permutations in numpy array
I've got a numpy array. 我有一个numpy数组。 What is the fastest way to compute all the permutations of orderings.
计算所有排序排列的最快方法是什么。
What I mean is, given the first element in my array, I want a list of all the elements that sequentially follow it. 我的意思是,鉴于我的数组中的第一个元素,我想要一个按顺序跟随它的所有元素的列表。 Then given the second element, a list of all the elements that follow it.
然后给出第二个元素,列出其后面的所有元素。
So given my list: b, c, & d follow a. 所以给我的清单:b,c,&d跟随a。 c & d follow b, and d follows c.
c&d跟随b,d跟随c。
x = np.array(["a", "b", "c", "d"])
So a potential output looks like: 所以潜在的输出看起来像:
[
["a","b"],
["a","c"],
["a","d"],
["b","c"],
["b","d"],
["c","d"],
]
I will need to do this several million times so I am looking for an efficient solution. 我需要做几百万次,所以我正在寻找一个有效的解决方案。
I tried something like: 我尝试过类似的东西:
im = np.vstack([x]*len(x))
a = np.vstack(([im], [im.T])).T
results = a[np.triu_indices(len(x),1)]
but its actually slower than looping... 但它实际上比循环慢......
You can use itertools
's functions like chain.from_iterable
and combinations
with np.fromiter
for this. 您可以使用
itertools
的函数,例如chain.from_iterable
以及与np.fromiter
combinations
。 This involves no loop in Python, but still not a pure NumPy solution: 这不涉及Python中的循环,但仍然不是纯粹的NumPy解决方案:
>>> from itertools import combinations, chain
>>> arr = np.fromiter(chain.from_iterable(combinations(x, 2)), dtype=x.dtype)
>>> arr.reshape(arr.size/2, 2)
array([['a', 'b'],
['a', 'c'],
['a', 'd'],
...,
['b', 'c'],
['b', 'd'],
['c', 'd']],
dtype='|S1')
Timing comparisons: 时间比较:
>>> x = np.array(["a", "b", "c", "d"]*100)
>>> %%timeit
im = np.vstack([x]*len(x))
a = np.vstack(([im], [im.T])).T
results = a[np.triu_indices(len(x),1)]
...
10 loops, best of 3: 29.2 ms per loop
>>> %%timeit
arr = np.fromiter(chain.from_iterable(combinations(x, 2)), dtype=x.dtype)
arr.reshape(arr.size/2, 2)
...
100 loops, best of 3: 6.63 ms per loop
I've been browsing the source and it seems the tri
functions have had some very substantial improvements relatively recently. 我一直在浏览源代码,看起来这
tri
函数最近都有了一些非常重要的改进。 The file is all Python so you can just copy it into your directory if that helps. 该文件都是Python,因此如果有帮助,您可以将其复制到您的目录中。
I seem to have completely different timings to Ashwini Chaudhary, even after taking this into account. 考虑到这一点,我似乎对Ashwini Chaudhary的时间完全不同。
It is very important to know the size of the arrays you want to do this on; 了解要执行此操作的阵列的大小非常重要; if it is small you should cache intermediates like
triu_indices
. 如果它很小,你应该缓存像
triu_indices
这样的triu_indices
。
The fastest code for me was: 对我来说最快的代码是:
def triangalize_1(x):
xs, ys = numpy.triu_indices(len(x), 1)
return numpy.array([x[xs], x[ys]]).T
unless the x
array is small. 除非
x
数组很小。
If x
is small, caching was best: 如果
x
很小,缓存最好:
triu_cache = {}
def triangalize_1(x):
if len(x) in triu_cache:
xs, ys = triu_cache[len(x)]
else:
xs, ys = numpy.triu_indices(len(x), 1)
triu_cache[len(x)] = xs, ys
return numpy.array([x[xs], x[ys]]).T
I wouldn't do this for large x
because of memory requirements. 由于内存需求,我不会为大
x
做这个。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.