简体   繁体   English

笛卡尔积可获取一组索引以指向NumPy数组中的唯一元素

[英]Cartesian product to get set of indices to point to unique elements in NumPy array

Whats a good way to get combinations of indices that points to unique elements in array. 什么是获取指向数组中唯一元素的索引组合的好方法。 For example a = [1,1,3,2] , the possible set of pointers would be {0,2,3}, {1,2,3} . 例如a = [1,1,3,2] ,可能的指针集将为{0,2,3}, {1,2,3}

I can use argsort in combination with splitting the elements by frequency to then use something like itertools.product to get all sets of indices I want. 我可以结合使用argsort和按频率分割元素,然后使用itertools.product东西来获取我想要的所有索引集。

This is what I tried: 这是我尝试的:

from numpy import array, split
from scipy.stats import itemfreq
from itertools import product
a = array([1,1,3,2])
fq = itemfreq(a)[:,1]
fq = [int(f + sum(fq[:i])) for i, f in enumerate(fq)]
print list(product(*(ptrs for ptrs in split(a.argsort(), fq) if len(ptrs))))
#> [(0, 3, 2), (1, 3, 2)]

How can I do this better? 我该如何做得更好?

This does get you the indices, but possibly not in the format you want: 这确实会为您提供索引,但可能不是您想要的格式:

[np.where(a==x) for x in np.unique(a)]

[(array([0, 1]),), (array([3]),), (array([2]),)]

I imagine there is a better way, without the for loop. 我想有一个更好的方法,没有for循环。

@atomh33ls's answer can be vectorized as follows. @ atomh33ls的答案可以向量化如下。

First, extract the inverse indices and counts of each unique item. 首先,提取每个唯一项的反索引和计数。 If you are using numpy >= 1.9: 如果您使用numpy> = 1.9:

_, idx, cnt = np.unique(a, return_inverse=True, return_counts=True)

In older versions, this does the same: 在旧版本中,此操作相同:

_, idx = np.unique(a, return_inverse=True)
cnt = np.bincount(idx)

And now, a little bit of magic and, voila: 现在,一点点魔术,瞧:

>>> np.split(np.arange(len(a))[np.argsort(idx)], np.cumsum(cnt)[:-1])
[array([0, 1]), array([3]), array([2])]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM