[英]Sort array with repeated values
I have to order an array with values from 0 to 9 that are repeated and obtain the vector initial index.我必须订购一个重复的值从 0 到 9 的数组并获得向量初始索引。 The input array is:
输入数组是:
[3, 1, 2, 0, 4, 5, 6, 7, 1, 0, 9, 5, 3, 9, 2, 7, 6, 4] [3, 1, 2, 0, 4, 5, 6, 7, 1, 0, 9, 5, 3, 9, 2, 7, 6, 4]
I would like to obtain the following order:我想获得以下订单:
array([0, 1, 2, 3, 4, 5, 6, 7, 9, 0, 1, 2, 3, 4, 5, 6, 7, 9], dtype=uint8)数组([0, 1, 2, 3, 4, 5, 6, 7, 9, 0, 1, 2, 3, 4, 5, 6, 7, 9], dtype=uint8)
Instead of:代替:
array([0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 9, 9])数组([0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 9, 9])
which is given by:由下式给出:
import numpy as np
a = [3, 1, 2, 0, 4, 5, 6, 7, 1, 0, 9, 5, 3, 9, 2, 7, 6, 4]
np.argsort(a)
Is there a way to manipulate this function?有没有办法操纵这个 function?
l = [1,2,3,4,5,6,7,8,9,4,3,5]
l_oredered = []
while len(l) != 0:
unique_nums = list(set(l))
unique_nums.sort()
l_oredered.extend(unique_nums)
for num in unique_nums:
l.remove(num)
print(l_oredered)
This will result with:这将导致:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 3, 4, 5]
You can apply the thinking with NumPy or convert the final result into a NumPy array.您可以应用 NumPy 的想法或将最终结果转换为 NumPy 数组。
Very interesting task!非常有趣的任务! Here is my attempt at to solve the problem
这是我解决问题的尝试
import numpy as np
def groupsort(a: np.ndarray):
uniques, counts = np.unique(a, return_counts=True)
min_count = np.min(counts) # Use this to solve easy case
counts -= min_count
rest = []
i = 0
while any(counts): # Hard case
if counts[i]:
rest.append(i)
counts[i] -= 1
i = (i + 1) % 3
return np.array(list(uniques) * min_count + rest)
a = np.array(list(range(4)) * 2 + [1,1,1,1,1,0,0,0,2,2])
np.random.shuffle(a)
print(a)
# [3 0 3 2 1 3 0 2 0 3 1 0 1 2 1 1 1]
print(groupsort(a))
# [0 1 2 3 0 1 2 3 0 1 2 3 0 1 3 1 1]
The idea is to split the problem in two cases.这个想法是将问题分为两种情况。 One easy case and one hard case.
一个简单的案例和一个困难的案例。 The easy case is to handle inputs like this:
a = [0,1,2,3,0,1,2,3]
, where the counts for each unique value are equal.最简单的情况是处理这样的输入:
a = [0,1,2,3,0,1,2,3]
,其中每个唯一值的计数相等。 Then you can simply count the number n
of a specific value (eg 0), then just do list(range(max(a))) * n
.然后您可以简单地计算特定值(例如 0)的数量
n
,然后只需执行list(range(max(a))) * n
即可。
The hard case is to handle inputs such as a = [1,1,1,1,1,0,0,0,2,2]
.最困难的情况是处理诸如
a = [1,1,1,1,1,0,0,0,2,2]
之类的输入。 Then the idea is to get the counts of each value, in this case counts = [3,5,2,0]
.然后想法是获取每个值的计数,在本例中
counts = [3,5,2,0]
。 Then do:然后做:
rest = []
i = 0
while any(counts):
if counts[i]:
rest.append(i)
counts[i] -= 1
i = (i + 1) % 3
In my solution you see I have combined the two solutions.在我的解决方案中,您会看到我结合了这两种解决方案。
For an array with a small number of elements @Mr.O's answer is faster.对于具有少量元素的数组@Mr.O 的答案更快。 The code below is faster if there are more than around 100 ints in arr.
如果 arr 中有超过 100 个整数,下面的代码会更快。
import numpy as np
def sort_groups( arr ):
ct = np.ones( len(arr), dtype = np.int64 )
for i in set( arr ):
ct[arr == i] = ct[arr == i ].cumsum()
# ct calculates a rank for each int in arr
tosort = ( arr.max() + 1 ) * ct + arr
# tosort ranks by ct first then a if ct's are equal
return arr[ np.argsort( tosort ) ]
a = np.array([3, 1, 2, 0, 4, 5, 6, 7, 1, 0, 9, 5, 3, 9, 2, 7, 6, 4])
sort_groups( a )
# array([0, 1, 2, 3, 4, 5, 6, 7, 9, 0, 1, 2, 3, 4, 5, 6, 7, 9])
Breaking the function out to see what's happening:打破 function 看看发生了什么:
arr = a
ct = np.ones( len(arr), dtype = np.int64 )
for i in set( arr ):
ct[arr == i] = ct[arr == i ].cumsum()
arr, ct
# (array([3, 1, 2, 0, 4, 5, 6, 7, 1, 0, 9, 5, 3, 9, 2, 7, 6, 4]),
# array([1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2]))
tosort = ( arr.max() + 1 ) * ct + arr # Assumes arr is > 0
tosort
# array([13, 11, 12, 10, 14, 15, 16, 17, 21, 20, 19, 25, 23, 29, 22, 27, 26, 24])
arr[ np.argsort( tosort ) ]
array([0, 1, 2, 3, 4, 5, 6, 7, 9, 0, 1, 2, 3, 4, 5, 6, 7, 9])
Here's a 2-liner:这是一个2线:
unique, counts = np.unique(a, return_counts=True)
b = [x for y in [[u for i, u in enumerate(unique) if counts[i] > n] for n in range(counts.max())] for x in y]
Output: Output:
>>> b
[0, 1, 2, 3, 4, 5, 6, 7, 9, 0, 1, 2, 3, 4, 5, 6, 7, 9, 1, 5, 9]
#^ reset ^ reset ^ reset
I prefer using np.bincount
instead of np.unique
, np.sort
or np.argsort
because it's much faster in cases maximum item of data is small.我更喜欢使用
np.bincount
而不是np.unique
、 np.sort
或np.argsort
,因为在最大数据项很小的情况下它会更快。
def count_out(arr):
bins = np.bincount(arr, minlength=N)
threshold_idx = np.unique(bins[bins!=0])
counts = np.diff(threshold_idx, prepend=0)
mask = (bins >= threshold_idx[:, None])
full_mask = np.repeat(mask, counts, axis=0)
blocks = np.repeat([np.arange(N)], np.sum(counts), axis=0)
return blocks[full_mask]
N = 10
X = np.array([3, 5, 3, 9, 9, 9, 9, 0, 0, 6, 8, 8, 7, 0, 5, 9, 7, 8, 1, 5, 8, 8, 1, 0, 7, 1, 9])
print(X)
print(count_out(X))
>>> [3 5 1 3 9 7 5 9 0 9 9 0 0 6 8 8 8 9 7 0 5 9 7 8 3 1 5 8 8 1 0 7 1 9 9 8]
>>> [0 1 3 5 6 7 8 9 0 1 3 5 7 8 9 0 1 3 5 7 8 9 0 1 5 7 8 9 0 8 9 8 9 8 9 9]
The key idea is to find counts
of how many times does each block repeat.关键思想是
counts
每个块重复多少次。 Then create unique mask for each block:然后为每个块创建唯一的掩码:
Blocks:块:
[[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]]
Unique masks:独特的面具:
[[1 1 0 1 0 1 1 1 1 1]
[1 1 0 1 0 1 0 1 1 1]
[1 1 0 0 0 1 0 1 1 1]
[1 0 0 0 0 0 0 0 1 1]
[0 0 0 0 0 0 0 0 1 1]
[0 0 0 0 0 0 0 0 0 1]]
Finally, reconstruct all the masks by the counts
we've got.最后,根据我们得到的
counts
重建所有掩码。
Counts: [1 2 1 1 2 1]
计数:
[1 2 1 1 2 1]
Full masks:全面具:
[[1 1 0 1 0 1 1 1 1 1]
[1 1 0 1 0 1 0 1 1 1]
[1 1 0 1 0 1 0 1 1 1]
[1 1 0 0 0 1 0 1 1 1]
[1 0 0 0 0 0 0 0 1 1]
[0 0 0 0 0 0 0 0 1 1]
[0 0 0 0 0 0 0 0 1 1]
[0 0 0 0 0 0 0 0 0 1]]
By the way, it seems this can be optimised further.顺便说一句,这似乎可以进一步优化。 At first, creation of repetitive blocks is redundant since there should be a way to create a pointer to one single block.
首先,重复块的创建是多余的,因为应该有一种方法可以创建指向单个块的指针。 Secondly, it's slow in case full mask is sparse.
其次,如果全掩码稀疏,它会很慢。 In this case you should consider implementing your own way to repeat blocks with no masking.
在这种情况下,您应该考虑实现自己的方式来重复没有掩码的块。
I hope it's helpful for you at the current point.我希望它对您目前有所帮助。
IIUC, you want to have a duplicated, sorted, array. IIUC,你想要一个重复的、排序的数组。
Remove the duplicated values using numpy.unique
, sort, and tile
to the expected size:使用
numpy.unique
删除重复值,排序并tile
到预期大小:
a = np.array([3, 1, 2, 0, 4, 5, 6, 7, 1, 0, 9, 5, 3, 9, 2, 7, 6, 4])
b = np.unique(a)
b = np.tile(b, len(a)//len(b))
output: output:
array([0, 1, 2, 3, 4, 5, 6, 7, 9, 0, 1, 2, 3, 4, 5, 6, 7, 9])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.