简体   繁体   中英

numpy fastest way to transform an array's elements to their frequency

As the title said, I am looking for a way to transform an array so it will be the array of frequency of its proper elements.

I found np.count and np.histogram but it's not what I am looking for

Something like:

From:

array_ = np.array([0,0,0,1,0,0,2,0,0,1,2,0])

To:

array_ = np.array([8,8,8,2,8,8,2,8,8,2,2,8])

Thanks in advance!

If the values in your array are nonnegative integers which aren't too large, you can use np.bincount . Using your original array as an index into the bincount result gives your desired output.

>>> array_ = np.array([0,0,0,1,0,0,2,0,0,1,2,0])
>>> np.bincount(array_)
array([8, 2, 2])
>>> np.bincount(array_)[array_]
array([8, 8, 8, 2, 8, 8, 2, 8, 8, 2, 2, 8])

Bear in mind that the result of np.bincount has size max(array_) + 1 , so if your array has large values this approach is inefficient: you end up creating a very large intermediate result.

An alternative approach that should be efficient even with large or negative inputs is to use np.unique with the return_inverse and return_counts arguments, as follows:

>>> array_ = np.array([0,0,0,1,0,0,2,0,0,1,2,0])
>>> _, inv, counts = np.unique(array_, return_inverse=True, return_counts=True)
>>> counts[inv]
array([8, 8, 8, 2, 8, 8, 2, 8, 8, 2, 2, 8])

Note that the return_counts argument is new in NumPy 1.9.0, so you'll need an up-to-date version of NumPy. If you don't have NumPy 1.9.0, all is not lost! You can still use the return_inverse argument of np.unique , which gives you back an array of small integers in the same arrangement as your original one. That new array is now in perfect shape for bincount to work on it efficiently:

>>> array_ = np.array([0,0,0,1,0,0,2,0,0,1,2,0])
>>> _, inverse = np.unique(array_, return_inverse=True)
>>> np.bincount(inverse)[inverse]
array([8, 8, 8, 2, 8, 8, 2, 8, 8, 2, 2, 8])

Another example, with larger array_ contents:

>>> array_ = np.array([0, 71, 598, 71, 0, 0, 243])
>>> _, inverse = np.unique(array_, return_inverse=True)
>>> inverse
array([0, 1, 3, 1, 0, 0, 2])
>>> np.bincount(inverse)[inverse]
array([3, 2, 1, 2, 3, 3, 1])

All of these solutions work in pure NumPy, so they should be significantly more efficient than a solution that goes via a Python Counter or dict . As always, though, if efficiency is a concern then you should profile to find out what's most suitable. Note in particular that np.unique is doing a sort under the hood, so its theoretical complexity is higher than that of the pure np.bincount solution. Whether that makes a difference in practice is impossible to say without timing. So let's do some timing, using IPython's timeit (this is on Python 3.4). First we'll define functions for the operations we need:

In [1]: import numpy as np; from collections import Counter

In [2]: def freq_bincount(array):
   ...:     return np.bincount(array)[array]
   ...: 

In [3]: def freq_unique(array):
   ...:     _, inverse, counts = np.unique(array, return_inverse=True, return_counts=True)
   ...:     return counts[inverse]
   ...: 

In [4]: def freq_counter(array):
   ...:     c = Counter(array)
   ...:     return np.array(list(map(c.get, array)))
   ...: 

Now we create a test array:

In [5]: test_array = np.random.randint(100, size=10**6)

And then we do some timings. Here are the results on my machine:

In [6]: %timeit freq_bincount(test_array)
100 loops, best of 3: 2.69 ms per loop

In [7]: %timeit freq_unique(test_array)
10 loops, best of 3: 166 ms per loop

In [8]: %timeit freq_counter(test_array)
1 loops, best of 3: 317 ms per loop

There's an order-of-magnitude difference between the np.bincount approach and the np.unique approach. The Counter approach from @Kasramvd's solution is somewhat slower than the np.unique approach, but that could change on a different machine or with different versions of Python and NumPy: you should test with data that are appropriate for your use-case.

As a fast approach you can use colections.Counter which is the more pythonic way for getting the frequency of an iterable items :

>>> import numpy as np
>>> array_ = np.array([0,0,0,1,0,0,2,0,0,1,2,0])
>>> from collections import Counter
>>> c=Counter(array_)
>>> np.array(map(c.get,array_))
array([8, 8, 8, 2, 8, 8, 2, 8, 8, 2, 2, 8])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM