对具有重复值的数组进行排序

Question

I have to order an array with values from 0 to 9 that are repeated and obtain the vector initial index.我必须订购一个重复的值从 0 到 9 的数组并获得向量初始索引。 The input array is:输入数组是：

[3, 1, 2, 0, 4, 5, 6, 7, 1, 0, 9, 5, 3, 9, 2, 7, 6, 4] [3, 1, 2, 0, 4, 5, 6, 7, 1, 0, 9, 5, 3, 9, 2, 7, 6, 4]

I would like to obtain the following order:我想获得以下订单：

array([0, 1, 2, 3, 4, 5, 6, 7, 9, 0, 1, 2, 3, 4, 5, 6, 7, 9], dtype=uint8)数组（[0, 1, 2, 3, 4, 5, 6, 7, 9, 0, 1, 2, 3, 4, 5, 6, 7, 9], dtype=uint8）

Instead of:代替：

array([0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 9, 9])数组（[0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 9, 9]）

which is given by:由下式给出：

import numpy as np
a = [3, 1, 2, 0, 4, 5, 6, 7, 1, 0, 9, 5, 3, 9, 2, 7, 6, 4]
np.argsort(a)

Is there a way to manipulate this function?有没有办法操纵这个 function？

Answer 1

l = [1,2,3,4,5,6,7,8,9,4,3,5]
l_oredered = []
while len(l) != 0:
    unique_nums = list(set(l))
    unique_nums.sort()
    l_oredered.extend(unique_nums)
    for num in unique_nums:
        l.remove(num)

print(l_oredered)

This will result with:这将导致：

[1, 2, 3, 4, 5, 6, 7, 8, 9, 3, 4, 5]

You can apply the thinking with NumPy or convert the final result into a NumPy array.您可以应用 NumPy 的想法或将最终结果转换为 NumPy 数组。

Answer 2

Very interesting task!非常有趣的任务！ Here is my attempt at to solve the problem这是我解决问题的尝试

import numpy as np

def groupsort(a: np.ndarray):
    uniques, counts = np.unique(a, return_counts=True)
    min_count = np.min(counts) # Use this to solve easy case
    counts -= min_count
    rest = []
    i = 0
    while any(counts): # Hard case
        if counts[i]:
            rest.append(i)
            counts[i] -= 1
        i = (i + 1) % 3
    return np.array(list(uniques) * min_count + rest)

a = np.array(list(range(4)) * 2 + [1,1,1,1,1,0,0,0,2,2])
np.random.shuffle(a)
print(a)
# [3 0 3 2 1 3 0 2 0 3 1 0 1 2 1 1 1]
print(groupsort(a))
# [0 1 2 3 0 1 2 3 0 1 2 3 0 1 3 1 1]

The idea is to split the problem in two cases.这个想法是将问题分为两种情况。 One easy case and one hard case.一个简单的案例和一个困难的案例。 The easy case is to handle inputs like this: a = [0,1,2,3,0,1,2,3] , where the counts for each unique value are equal.最简单的情况是处理这样的输入： a = [0,1,2,3,0,1,2,3] ，其中每个唯一值的计数相等。 Then you can simply count the number n of a specific value (eg 0), then just do list(range(max(a))) * n .然后您可以简单地计算特定值（例如 0）的数量n ，然后只需执行list(range(max(a))) * n即可。

The hard case is to handle inputs such as a = [1,1,1,1,1,0,0,0,2,2] .最困难的情况是处理诸如a = [1,1,1,1,1,0,0,0,2,2]之类的输入。 Then the idea is to get the counts of each value, in this case counts = [3,5,2,0] .然后想法是获取每个值的计数，在本例中counts = [3,5,2,0] 。 Then do:然后做：

rest = []
i = 0
while any(counts):
    if counts[i]:
        rest.append(i)
        counts[i] -= 1
    i = (i + 1) % 3

In my solution you see I have combined the two solutions.在我的解决方案中，您会看到我结合了这两种解决方案。

Answer 3

For an array with a small number of elements @Mr.O's answer is faster.对于具有少量元素的数组@Mr.O 的答案更快。 The code below is faster if there are more than around 100 ints in arr.如果 arr 中有超过 100 个整数，下面的代码会更快。

import numpy as np

def sort_groups( arr ):
    ct = np.ones( len(arr), dtype = np.int64 )
    for i in set( arr ):
        ct[arr == i] = ct[arr == i ].cumsum()
    # ct calculates a rank for each int in arr
    tosort = ( arr.max() + 1 ) * ct + arr 
    # tosort ranks by ct first then a if ct's are equal
    return arr[ np.argsort( tosort ) ]
     
a = np.array([3, 1, 2, 0, 4, 5, 6, 7, 1, 0, 9, 5, 3, 9, 2, 7, 6, 4])
sort_groups( a )
# array([0, 1, 2, 3, 4, 5, 6, 7, 9, 0, 1, 2, 3, 4, 5, 6, 7, 9])

Breaking the function out to see what's happening:打破 function 看看发生了什么：

arr = a

ct = np.ones( len(arr), dtype = np.int64 )
for i in set( arr ):
    ct[arr == i] = ct[arr == i ].cumsum()

arr, ct
# (array([3, 1, 2, 0, 4, 5, 6, 7, 1, 0, 9, 5, 3, 9, 2, 7, 6, 4]),
#  array([1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2]))

tosort = ( arr.max() + 1 ) * ct + arr  # Assumes arr is > 0

tosort
# array([13, 11, 12, 10, 14, 15, 16, 17, 21, 20, 19, 25, 23, 29, 22, 27, 26, 24])

arr[ np.argsort( tosort ) ]
array([0, 1, 2, 3, 4, 5, 6, 7, 9, 0, 1, 2, 3, 4, 5, 6, 7, 9])

Answer 4

Here's a 2-liner:这是一个2线：

unique, counts = np.unique(a, return_counts=True)
b = [x for y in [[u for i, u in enumerate(unique) if counts[i] > n] for n in range(counts.max())] for x in y]

Output: Output：

>>> b
[0, 1, 2, 3, 4, 5, 6, 7, 9, 0, 1, 2, 3, 4, 5, 6, 7, 9, 1, 5, 9]
#^ reset                    ^ reset                    ^ reset

Answer 5

I prefer using np.bincount instead of np.unique , np.sort or np.argsort because it's much faster in cases maximum item of data is small.我更喜欢使用np.bincount而不是np.unique 、 np.sort或np.argsort ，因为在最大数据项很小的情况下它会更快。

def count_out(arr):
    bins = np.bincount(arr, minlength=N) 
    threshold_idx = np.unique(bins[bins!=0]) 
    counts = np.diff(threshold_idx, prepend=0)
    mask = (bins >= threshold_idx[:, None])
    full_mask = np.repeat(mask, counts, axis=0)
    blocks = np.repeat([np.arange(N)], np.sum(counts), axis=0)
    return blocks[full_mask]

N = 10
X = np.array([3, 5, 3, 9, 9, 9, 9, 0, 0, 6, 8, 8, 7, 0, 5, 9, 7, 8, 1, 5, 8, 8, 1, 0, 7, 1, 9])
print(X)
print(count_out(X))
>>> [3 5 1 3 9 7 5 9 0 9 9 0 0 6 8 8 8 9 7 0 5 9 7 8 3 1 5 8 8 1 0 7 1 9 9 8]
>>> [0 1 3 5 6 7 8 9 0 1 3 5 7 8 9 0 1 3 5 7 8 9 0 1 5 7 8 9 0 8 9 8 9 8 9 9]

The key idea is to find counts of how many times does each block repeat.关键思想是counts每个块重复多少次。 Then create unique mask for each block:然后为每个块创建唯一的掩码：

Blocks:块：

[[0 1 2 3 4 5 6 7 8 9]
 [0 1 2 3 4 5 6 7 8 9]
 [0 1 2 3 4 5 6 7 8 9]
 [0 1 2 3 4 5 6 7 8 9]
 [0 1 2 3 4 5 6 7 8 9]
 [0 1 2 3 4 5 6 7 8 9]]

Unique masks:独特的面具：

[[1 1 0 1 0 1 1 1 1 1]
 [1 1 0 1 0 1 0 1 1 1]
 [1 1 0 0 0 1 0 1 1 1]
 [1 0 0 0 0 0 0 0 1 1]
 [0 0 0 0 0 0 0 0 1 1]
 [0 0 0 0 0 0 0 0 0 1]]

Finally, reconstruct all the masks by the counts we've got.最后，根据我们得到的counts重建所有掩码。

Counts: [1 2 1 1 2 1]计数： [1 2 1 1 2 1]

Full masks:全面具：

[[1 1 0 1 0 1 1 1 1 1]
 [1 1 0 1 0 1 0 1 1 1]
 [1 1 0 1 0 1 0 1 1 1]
 [1 1 0 0 0 1 0 1 1 1]
 [1 0 0 0 0 0 0 0 1 1]
 [0 0 0 0 0 0 0 0 1 1]
 [0 0 0 0 0 0 0 0 1 1]
 [0 0 0 0 0 0 0 0 0 1]]

By the way, it seems this can be optimised further.顺便说一句，这似乎可以进一步优化。 At first, creation of repetitive blocks is redundant since there should be a way to create a pointer to one single block.首先，重复块的创建是多余的，因为应该有一种方法可以创建指向单个块的指针。 Secondly, it's slow in case full mask is sparse.其次，如果全掩码稀疏，它会很慢。 In this case you should consider implementing your own way to repeat blocks with no masking.在这种情况下，您应该考虑实现自己的方式来重复没有掩码的块。

I hope it's helpful for you at the current point.我希望它对您目前有所帮助。

Answer 6

IIUC, you want to have a duplicated, sorted, array. IIUC，你想要一个重复的、排序的数组。

Remove the duplicated values using numpy.unique , sort, and tile to the expected size:使用numpy.unique删除重复值，排序并tile到预期大小：

a = np.array([3, 1, 2, 0, 4, 5, 6, 7, 1, 0, 9, 5, 3, 9, 2, 7, 6, 4])
b = np.unique(a)
b = np.tile(b, len(a)//len(b))

output: output：

array([0, 1, 2, 3, 4, 5, 6, 7, 9, 0, 1, 2, 3, 4, 5, 6, 7, 9])

对具有重复值的数组进行排序

问题描述

6 个解决方案

解决方案1
0 2021-12-12 19:27:27

解决方案2
0 2021-12-12 20:21:42

解决方案3
0 2021-12-12 21:11:45

解决方案4
0 2021-12-13 00:49:23

解决方案5
0 2021-12-13 04:06:47

解决方案6
-1 2021-12-12 19:20:37

对具有重复值的数组进行排序

问题描述

6 个解决方案

解决方案1 0 2021-12-12 19:27:27

解决方案2 0 2021-12-12 20:21:42

解决方案3 0 2021-12-12 21:11:45

解决方案4 0 2021-12-13 00:49:23

解决方案5 0 2021-12-13 04:06:47

解决方案6 -1 2021-12-12 19:20:37

解决方案1
0 2021-12-12 19:27:27

解决方案2
0 2021-12-12 20:21:42

解决方案3
0 2021-12-12 21:11:45

解决方案4
0 2021-12-13 00:49:23

解决方案5
0 2021-12-13 04:06:47

解决方案6
-1 2021-12-12 19:20:37