简体   繁体   English

对具有重复值的数组进行排序

[英]Sort array with repeated values

I have to order an array with values from 0 to 9 that are repeated and obtain the vector initial index.我必须订购一个重复的值从 0 到 9 的数组并获得向量初始索引。 The input array is:输入数组是:

[3, 1, 2, 0, 4, 5, 6, 7, 1, 0, 9, 5, 3, 9, 2, 7, 6, 4] [3, 1, 2, 0, 4, 5, 6, 7, 1, 0, 9, 5, 3, 9, 2, 7, 6, 4]

I would like to obtain the following order:我想获得以下订单:

array([0, 1, 2, 3, 4, 5, 6, 7, 9, 0, 1, 2, 3, 4, 5, 6, 7, 9], dtype=uint8)数组([0, 1, 2, 3, 4, 5, 6, 7, 9, 0, 1, 2, 3, 4, 5, 6, 7, 9], dtype=uint8)

Instead of:代替:

array([0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 9, 9])数组([0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 9, 9])

which is given by:由下式给出:

import numpy as np
a = [3, 1, 2, 0, 4, 5, 6, 7, 1, 0, 9, 5, 3, 9, 2, 7, 6, 4]
np.argsort(a)

Is there a way to manipulate this function?有没有办法操纵这个 function?

l = [1,2,3,4,5,6,7,8,9,4,3,5]
l_oredered = []
while len(l) != 0:
    unique_nums = list(set(l))
    unique_nums.sort()
    l_oredered.extend(unique_nums)
    for num in unique_nums:
        l.remove(num)

print(l_oredered)

This will result with:这将导致:

[1, 2, 3, 4, 5, 6, 7, 8, 9, 3, 4, 5]

You can apply the thinking with NumPy or convert the final result into a NumPy array.您可以应用 NumPy 的想法或将最终结果转换为 NumPy 数组。

Very interesting task!非常有趣的任务! Here is my attempt at to solve the problem这是我解决问题的尝试

import numpy as np

def groupsort(a: np.ndarray):
    uniques, counts = np.unique(a, return_counts=True)
    min_count = np.min(counts) # Use this to solve easy case
    counts -= min_count
    rest = []
    i = 0
    while any(counts): # Hard case
        if counts[i]:
            rest.append(i)
            counts[i] -= 1
        i = (i + 1) % 3
    return np.array(list(uniques) * min_count + rest)

a = np.array(list(range(4)) * 2 + [1,1,1,1,1,0,0,0,2,2])
np.random.shuffle(a)
print(a)
# [3 0 3 2 1 3 0 2 0 3 1 0 1 2 1 1 1]
print(groupsort(a))
# [0 1 2 3 0 1 2 3 0 1 2 3 0 1 3 1 1]

The idea is to split the problem in two cases.这个想法是将问题分为两种情况。 One easy case and one hard case.一个简单的案例和一个困难的案例。 The easy case is to handle inputs like this: a = [0,1,2,3,0,1,2,3] , where the counts for each unique value are equal.最简单的情况是处理这样的输入: a = [0,1,2,3,0,1,2,3] ,其中每个唯一值的计数相等。 Then you can simply count the number n of a specific value (eg 0), then just do list(range(max(a))) * n .然后您可以简单地计算特定值(例如 0)的数量n ,然后只需执行list(range(max(a))) * n即可。

The hard case is to handle inputs such as a = [1,1,1,1,1,0,0,0,2,2] .最困难的情况是处理诸如a = [1,1,1,1,1,0,0,0,2,2]之类的输入。 Then the idea is to get the counts of each value, in this case counts = [3,5,2,0] .然后想法是获取每个值的计数,在本例中counts = [3,5,2,0] Then do:然后做:

rest = []
i = 0
while any(counts):
    if counts[i]:
        rest.append(i)
        counts[i] -= 1
    i = (i + 1) % 3

In my solution you see I have combined the two solutions.在我的解决方案中,您会看到我结合了这两种解决方案。

For an array with a small number of elements @Mr.O's answer is faster.对于具有少量元素的数组@Mr.O 的答案更快。 The code below is faster if there are more than around 100 ints in arr.如果 arr 中有超过 100 个整数,下面的代码会更快。

import numpy as np

def sort_groups( arr ):
    ct = np.ones( len(arr), dtype = np.int64 )
    for i in set( arr ):
        ct[arr == i] = ct[arr == i ].cumsum()
    # ct calculates a rank for each int in arr
    tosort = ( arr.max() + 1 ) * ct + arr 
    # tosort ranks by ct first then a if ct's are equal
    return arr[ np.argsort( tosort ) ]
     
a = np.array([3, 1, 2, 0, 4, 5, 6, 7, 1, 0, 9, 5, 3, 9, 2, 7, 6, 4])
sort_groups( a )
# array([0, 1, 2, 3, 4, 5, 6, 7, 9, 0, 1, 2, 3, 4, 5, 6, 7, 9])

Breaking the function out to see what's happening:打破 function 看看发生了什么:

arr = a

ct = np.ones( len(arr), dtype = np.int64 )
for i in set( arr ):
    ct[arr == i] = ct[arr == i ].cumsum()

arr, ct
# (array([3, 1, 2, 0, 4, 5, 6, 7, 1, 0, 9, 5, 3, 9, 2, 7, 6, 4]),
#  array([1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2]))

tosort = ( arr.max() + 1 ) * ct + arr  # Assumes arr is > 0

tosort
# array([13, 11, 12, 10, 14, 15, 16, 17, 21, 20, 19, 25, 23, 29, 22, 27, 26, 24])

arr[ np.argsort( tosort ) ]
array([0, 1, 2, 3, 4, 5, 6, 7, 9, 0, 1, 2, 3, 4, 5, 6, 7, 9])

Here's a 2-liner:这是一个2线:

unique, counts = np.unique(a, return_counts=True)
b = [x for y in [[u for i, u in enumerate(unique) if counts[i] > n] for n in range(counts.max())] for x in y]

Output: Output:

>>> b
[0, 1, 2, 3, 4, 5, 6, 7, 9, 0, 1, 2, 3, 4, 5, 6, 7, 9, 1, 5, 9]
#^ reset                    ^ reset                    ^ reset

I prefer using np.bincount instead of np.unique , np.sort or np.argsort because it's much faster in cases maximum item of data is small.我更喜欢使用np.bincount而不是np.uniquenp.sortnp.argsort ,因为在最大数据项很小的情况下它会更快。

def count_out(arr):
    bins = np.bincount(arr, minlength=N) 
    threshold_idx = np.unique(bins[bins!=0]) 
    counts = np.diff(threshold_idx, prepend=0)
    mask = (bins >= threshold_idx[:, None])
    full_mask = np.repeat(mask, counts, axis=0)
    blocks = np.repeat([np.arange(N)], np.sum(counts), axis=0)
    return blocks[full_mask]

N = 10
X = np.array([3, 5, 3, 9, 9, 9, 9, 0, 0, 6, 8, 8, 7, 0, 5, 9, 7, 8, 1, 5, 8, 8, 1, 0, 7, 1, 9])
print(X)
print(count_out(X))
>>> [3 5 1 3 9 7 5 9 0 9 9 0 0 6 8 8 8 9 7 0 5 9 7 8 3 1 5 8 8 1 0 7 1 9 9 8]
>>> [0 1 3 5 6 7 8 9 0 1 3 5 7 8 9 0 1 3 5 7 8 9 0 1 5 7 8 9 0 8 9 8 9 8 9 9]

The key idea is to find counts of how many times does each block repeat.关键思想是counts每个块重复多少次。 Then create unique mask for each block:然后为每个块创建唯一的掩码:

Blocks:块:

[[0 1 2 3 4 5 6 7 8 9]
 [0 1 2 3 4 5 6 7 8 9]
 [0 1 2 3 4 5 6 7 8 9]
 [0 1 2 3 4 5 6 7 8 9]
 [0 1 2 3 4 5 6 7 8 9]
 [0 1 2 3 4 5 6 7 8 9]]

Unique masks:独特的面具:

[[1 1 0 1 0 1 1 1 1 1]
 [1 1 0 1 0 1 0 1 1 1]
 [1 1 0 0 0 1 0 1 1 1]
 [1 0 0 0 0 0 0 0 1 1]
 [0 0 0 0 0 0 0 0 1 1]
 [0 0 0 0 0 0 0 0 0 1]]

Finally, reconstruct all the masks by the counts we've got.最后,根据我们得到的counts重建所有掩码。

Counts: [1 2 1 1 2 1]计数: [1 2 1 1 2 1]

Full masks:全面具:

[[1 1 0 1 0 1 1 1 1 1]
 [1 1 0 1 0 1 0 1 1 1]
 [1 1 0 1 0 1 0 1 1 1]
 [1 1 0 0 0 1 0 1 1 1]
 [1 0 0 0 0 0 0 0 1 1]
 [0 0 0 0 0 0 0 0 1 1]
 [0 0 0 0 0 0 0 0 1 1]
 [0 0 0 0 0 0 0 0 0 1]]

By the way, it seems this can be optimised further.顺便说一句,这似乎可以进一步优化。 At first, creation of repetitive blocks is redundant since there should be a way to create a pointer to one single block.首先,重复块的创建是多余的,因为应该有一种方法可以创建指向单个块的指针。 Secondly, it's slow in case full mask is sparse.其次,如果全掩码稀疏,它会很慢。 In this case you should consider implementing your own way to repeat blocks with no masking.在这种情况下,您应该考虑实现自己的方式来重复没有掩码的块

I hope it's helpful for you at the current point.我希望它对您目前有所帮助。

IIUC, you want to have a duplicated, sorted, array. IIUC,你想要一个重复的、排序的数组。

Remove the duplicated values using numpy.unique , sort, and tile to the expected size:使用numpy.unique删除重复值,排序并tile到预期大小:

a = np.array([3, 1, 2, 0, 4, 5, 6, 7, 1, 0, 9, 5, 3, 9, 2, 7, 6, 4])
b = np.unique(a)
b = np.tile(b, len(a)//len(b))

output: output:

array([0, 1, 2, 3, 4, 5, 6, 7, 9, 0, 1, 2, 3, 4, 5, 6, 7, 9])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 pandas:如何按照重复次数最多到重复次数最少的顺序对值进行排序? - pandas: How to sort values in order of most repeated to least repeated? 在numpy数组中查找重复值的索引 - Finding indices of repeated values in numpy array 检查 JSON object 数组中的重复值 - Check for repeated values in JSON object array 如何根据另一个数组中的重复值在一个数组中添加值? - How to add values in one array according to repeated values in another array? 随机获取Python中重复值数组的3个最小值 - Get randomly the 3 minimum values of an repeated-values array in Python 根据其他两个具有重复值的 arrays 将值分配给数组 - Assign values to an array based on two other arrays with repeated values 有没有办法根据包含重复文本值的另一列的值对一列的值(从最小值到最大值)进行排序? - Is there a way to sort values of one column (min to max) based on the values of another column that contain repeated text values? 排序并设置为无重复部分 - Sort and set for no repeated section 如何计算数组中重复一对特定值的次数? - How to count how many times a pair of specific values are repeated in an array? 确定numpy数组中的重复值并将其添加到另一列python中 - determining repeated values in numpy array and adding them in another column python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM