简体   繁体   English

返回一维 NumPy 数组中唯一值索引的代码的说明

[英]Explanation for the code that returns indices of unique values in 1D NumPy array

I found this snippet of code online and am having difficulty in understanding what each part of it is doing as I'm not proficient in Python.我在网上找到了这段代码,但由于我不精通 Python,所以很难理解它的每一部分在做什么。

The following routine takes an array as input and returns a dictionary that maps each unique value to its indices以下例程将数组作为输入并返回将每个唯一值映射到其索引的字典

def partition(array):
  return {i: (array == i).nonzero()[0] for i in np.unique(array)}

Trace each part out, this should speak for itself.追踪每个部分,这应该不言自明。 Comments inlined.评论内联。

In [304]: array = np.array([1, 1, 2, 3, 2, 1, 2, 3])

In [305]: np.unique(array)            # unique values in `array`
Out[305]: array([1, 2, 3])

In [306]: array == 1                  # retrieve a boolean mask where elements are equal to 1
Out[306]: array([ True,  True, False, False, False,  True, False, False])

In [307]: (array == 1).nonzero()[0]   # get the `True` indices for the operation above
Out[307]: array([0, 1, 5])

In summary;总之; the code is creating a mapping of <unique_value: all indices of unique_value in array> -代码正在创建<unique_value: all indices of unique_value in array>映射 -

In [308]: {i: (array == i).nonzero()[0] for i in np.unique(array)}
Out[308]: {1: array([0, 1, 5]), 2: array([2, 4, 6]), 3: array([3, 7])}

And here's the slightly more readable version -这是更具可读性的版本 -

In [313]: mapping = {}
     ...: for i in np.unique(array):
     ...:     mapping[i] = np.where(array == i)[0] 
     ...:     

In [314]: mapping
Out[314]: {1: array([0, 1, 5]), 2: array([2, 4, 6]), 3: array([3, 7])}
  • array == i Return a boolean array of True whenever the value is equal to i and False otherwise. array == i只要值等于 i 就返回一个布尔数组 True ,否则返回 False 。
  • nonzero() Return the indices of the elements that are non-zero(not False). nonzero()返回非零(非假)元素的索引。 https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.nonzero.html https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.nonzero.html
  • nonzero()[0] Return the first index where array[index] = i. nonzero()[0]返回第一个索引,其中 array[index] = i。
  • for i in np.unique(array) Iterate over all the unique values of array or in other words do the logic foreach value of unique value of the array. for i in np.unique(array)迭代数组的所有唯一值,或者换句话说,对数组的唯一值的每个值执行逻辑。

consider also the following Pandas solution:还要考虑以下 Pandas 解决方案:

import pandas as pd

In [165]: s = pd.Series(array)

In [166]: d = s.groupby(s).groups

In [167]: d
Out[167]:
{1: Int64Index([0, 1, 5], dtype='int64'),
 2: Int64Index([2, 4, 6], dtype='int64'),
 3: Int64Index([3, 7], dtype='int64')}

PS pandas.Int64Index - supports all methods and indexing like a regular 1D numpy array PS pandas.Int64Index - 支持所有方法和像常规一维 numpy 数组一样的索引

it can be easily converted to Numpy array:它可以很容易地转换为 Numpy 数组:

In [168]: {k:v.values for k,v in s.groupby(s).groups.items()}
Out[168]:
{1: array([0, 1, 5], dtype=int64),
 2: array([2, 4, 6], dtype=int64),
 3: array([3, 7], dtype=int64)}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM