简体   繁体   English

根据另一个数组的值(未排序,但分组)将 NumPy 数组拆分为子数组

[英]Split a NumPy array into subarrays according to the values (not sorted, but grouped) of another array

Suppose I have two NumPy arrays假设我有两个 NumPy arrays

x = [[1, 2, 8],
     [2, 9, 1],
     [3, 8, 9],
     [4, 3, 5],
     [5, 2, 3],
     [6, 4, 7],
     [7, 2, 3],
     [8, 2, 2],
     [9, 5, 3],
     [10, 2, 3],
     [11, 2, 4]]
y = [0, 0, 1, 0, 1, 1, 2, 2, 2, 0, 0] 

Note: (values in x are not sorted in any way. I chose this example to better illustrate the example) (These are just two examples of x and y . values of x and y can be arbitrarily many different numbers and y can have arbitrarily different numbers, but there are always as many values in x as there are in y )注意:( x中的值没有以任何方式排序。我选择这个示例是为了更好地说明示例)(这些只是xy的两个示例xy的值可以是任意多个不同的数字, y可以具有任意不同的数字,但x中的值总是与y中的值一样多)

I want to efficiently split the array x into sub-arrays according to the values in y .我想根据y中的值有效地将数组x拆分为子数组。

My desired outputs would be我想要的输出是

z_0 = [[1, 2, 8],
       [2, 9, 1],
       [4, 3, 5],
       [10, 2, 3],
       [11, 2, 4]]
z_1 = [[3, 8, 9],
       [5, 2, 3],
       [6, 4, 7],]
z_2 = [[7, 2, 3],
       [8, 2, 2],
       [9, 5, 3]]

Assuming that y starts with zero and is not sorted but grouped, what is the most efficient way to do this?假设y从零开始并且没有排序而是分组,那么最有效的方法是什么?

Note: This question is the unsorted version of this question: Split a NumPy array into subarrays according to the values (sorted in ascending order) of another array注意:这个问题是这个问题的未排序版本: Split a NumPy array into subarrays based on the values (sorted in up order) of another array

One way to solve this is to build up a list of filter indexes for each y value and then simply select those elements of x .解决这个问题的一种方法是为每个y值建立一个过滤器索引列表,然后简单地 select x的那些元素。 For example:例如:

z_0 = x[[i for i, v in enumerate(y) if v == 0]]
z_1 = x[[i for i, v in enumerate(y) if v == 1]]
z_2 = x[[i for i, v in enumerate(y) if v == 2]]

Output Output

array([[ 1,  2,  8],
       [ 2,  9,  1],
       [ 4,  3,  5],
       [10,  2,  3],
       [11,  2,  4]])
array([[3, 8, 9],
       [5, 2, 3],
       [6, 4, 7]])
array([[7, 2, 3],
       [8, 2, 2],
       [9, 5, 3]])

If you want to be more generic and support different sets of numbers in y , you could use a comprehension to produce a list of arrays eg如果您想更通用并支持y中的不同数字集,您可以使用理解来生成 arrays 的列表,例如

z = [x[[i for i, v in enumerate(y) if v == m]] for m in set(y)]

Output: Output:

[array([[ 1,  2,  8],
       [ 2,  9,  1],
       [ 4,  3,  5],
       [10,  2,  3],
       [11,  2,  4]]),
 array([[3, 8, 9],
       [5, 2, 3],
       [6, 4, 7]]),
 array([[7, 2, 3],
       [8, 2, 2],
       [9, 5, 3]])]

If y is also an np.array and the same length as x you can simplify this to use boolean indexing:如果y也是np.array并且与 x 长度相同,则可以简化它以使用 boolean 索引:

z = [x[y==m] for m in set(y)]

Output is the same as above. Output 同上。

Just use list comprehension and boolean indexing只需使用列表理解和 boolean 索引

x = np.array(x)
y = np.array(y)

z = [x[y == i] for i in range(y.max() + 1)]

z
Out[]: 
[array([[ 1,  2,  8],
        [ 2,  9,  1],
        [ 4,  3,  5],
        [10,  2,  3],
        [11,  2,  4]]),
 array([[3, 8, 9],
        [5, 2, 3],
        [6, 4, 7]]),
 array([[7, 2, 3],
        [8, 2, 2],
        [9, 5, 3]])]

Slight variation.略有变化。

from operator import itemgetter
label = itemgetter(1)

Associate the implied information with the label... (index,label)将隐含信息与 label... (index,label)相关联

y1 = [thing for thing in enumerate(y)]

Sort on the label在 label 上排序

y1.sort(key=label)

Group by label and construct the results按 label 分组并构造结果

import itertools
d = {}
for key,group in itertools.groupby(y1,label):
    d[f'z{key}'] = [x[i] for i,k in group]

Pandas solution: Pandas解决方案:

>>> import pandas as pd
>>> >>> df = pd.DataFrame({'points':[thing for thing in x],'cat':y})
>>> z = df.groupby('cat').agg(list)
>>> z       
                                                points
cat
0    [[1, 2, 8], [2, 9, 1], [4, 3, 5], [10, 2, 3], ...
1                    [[3, 8, 9], [5, 2, 3], [6, 4, 7]]
2                    [[7, 2, 3], [8, 2, 2], [9, 5, 3]]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM