简体   繁体   中英

How to sort a NumPy array by frequency?

I am attempting to sort a NumPy array by frequency of elements. So for example, if there's an array [3,4,5,1,2,4,1,1,2,4], the output would be another NumPy sorted from most common to least common elements (no duplicates). So the solution would be [4,1,2,3,5]. If two elements have the same number of occurrences, the element that appears first is placed first in the output. I have tried doing this, but I can't seem to get a functional answer. Here is my code so far:

temp1 = problems[j]
indexes = np.unique(temp1, return_index = True)[1]
temp2 = temp1[np.sort(indexes)]
temp3 = np.unique(temp1, return_counts = True)[1]
temp4 = np.argsort(temp3)[::-1] + 1

where problems[j] is a NumPy array like [3,4,5,1,2,4,1,1,2,4]. temp4 returns [4,1,2,5,3] so far but it is not correct because it can't handle when two elements have the same number of occurrences.

You can use argsort on the frequency of each element to find the sorted positions and apply the indexes to the unique element array

unique_elements, frequency = np.unique(array, return_counts=True)
sorted_indexes = np.argsort(frequency)[::-1]
sorted_by_freq = unique_elements[sorted_indexes]

A non-NumPy solution, which does still work with NumPy arrays, is to use an OrderedCounter followed by sorted with a custom function:

from collections import OrderedDict, Counter

class OrderedCounter(Counter, OrderedDict):
    pass

L = [3,4,5,1,2,4,1,1,2,4]

c = OrderedCounter(L)
keys = list(c)

res = sorted(c, key=lambda x: (-c[x], keys.index(x)))

print(res)

[4, 1, 2, 3, 5]

You can count up the number of each element in the array, and then use it as a key to the build-in sorted function

def sortbyfreq(arr):
    s = set(arr)
    keys = {n: (-arr.count(n), arr.index(n)) for n in s}
    return sorted(list(s), key=lambda n: keys[n])

Use zip and itemgetter should help

from operator import itemgetter
import numpy as np
temp1 = problems[j]
temp, idx, cnt = np.unique(temp1, return_index = True, return_counts=True)
cnt = 1 / cnt
k = sorted(zip(temp, cnt, idx), key=itemgetter(1, 2))
print(next(zip(*k)))

If the values are integer and small, or you only care about bins of size 1:

def sort_by_frequency(arr):
    return np.flip(np.argsort(np.bincount(arr))[-(np.unique(arr).size):])

v = [1,1,1,1,1,2,2,9,3,3,3,3,7,8,8]
sort_by_frequency(v)

this should yield

array([1, 3, 8, 2, 9, 7]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM