As the title said, I am looking for a way to transform an array so it will be the array of frequency of its proper elements.
I found np.count
and np.histogram
but it's not what I am looking for
Something like:
From:
array_ = np.array([0,0,0,1,0,0,2,0,0,1,2,0])
To:
array_ = np.array([8,8,8,2,8,8,2,8,8,2,2,8])
Thanks in advance!
If the values in your array are nonnegative integers which aren't too large, you can use np.bincount
. Using your original array as an index into the bincount
result gives your desired output.
>>> array_ = np.array([0,0,0,1,0,0,2,0,0,1,2,0])
>>> np.bincount(array_)
array([8, 2, 2])
>>> np.bincount(array_)[array_]
array([8, 8, 8, 2, 8, 8, 2, 8, 8, 2, 2, 8])
Bear in mind that the result of np.bincount
has size max(array_) + 1
, so if your array has large values this approach is inefficient: you end up creating a very large intermediate result.
An alternative approach that should be efficient even with large or negative inputs is to use np.unique
with the return_inverse
and return_counts
arguments, as follows:
>>> array_ = np.array([0,0,0,1,0,0,2,0,0,1,2,0])
>>> _, inv, counts = np.unique(array_, return_inverse=True, return_counts=True)
>>> counts[inv]
array([8, 8, 8, 2, 8, 8, 2, 8, 8, 2, 2, 8])
Note that the return_counts
argument is new in NumPy 1.9.0, so you'll need an up-to-date version of NumPy. If you don't have NumPy 1.9.0, all is not lost! You can still use the return_inverse
argument of np.unique
, which gives you back an array of small integers in the same arrangement as your original one. That new array is now in perfect shape for bincount
to work on it efficiently:
>>> array_ = np.array([0,0,0,1,0,0,2,0,0,1,2,0])
>>> _, inverse = np.unique(array_, return_inverse=True)
>>> np.bincount(inverse)[inverse]
array([8, 8, 8, 2, 8, 8, 2, 8, 8, 2, 2, 8])
Another example, with larger array_
contents:
>>> array_ = np.array([0, 71, 598, 71, 0, 0, 243])
>>> _, inverse = np.unique(array_, return_inverse=True)
>>> inverse
array([0, 1, 3, 1, 0, 0, 2])
>>> np.bincount(inverse)[inverse]
array([3, 2, 1, 2, 3, 3, 1])
All of these solutions work in pure NumPy, so they should be significantly more efficient than a solution that goes via a Python Counter
or dict
. As always, though, if efficiency is a concern then you should profile to find out what's most suitable. Note in particular that np.unique
is doing a sort under the hood, so its theoretical complexity is higher than that of the pure np.bincount
solution. Whether that makes a difference in practice is impossible to say without timing. So let's do some timing, using IPython's timeit
(this is on Python 3.4). First we'll define functions for the operations we need:
In [1]: import numpy as np; from collections import Counter
In [2]: def freq_bincount(array):
...: return np.bincount(array)[array]
...:
In [3]: def freq_unique(array):
...: _, inverse, counts = np.unique(array, return_inverse=True, return_counts=True)
...: return counts[inverse]
...:
In [4]: def freq_counter(array):
...: c = Counter(array)
...: return np.array(list(map(c.get, array)))
...:
Now we create a test array:
In [5]: test_array = np.random.randint(100, size=10**6)
And then we do some timings. Here are the results on my machine:
In [6]: %timeit freq_bincount(test_array)
100 loops, best of 3: 2.69 ms per loop
In [7]: %timeit freq_unique(test_array)
10 loops, best of 3: 166 ms per loop
In [8]: %timeit freq_counter(test_array)
1 loops, best of 3: 317 ms per loop
There's an order-of-magnitude difference between the np.bincount
approach and the np.unique
approach. The Counter
approach from @Kasramvd's solution is somewhat slower than the np.unique
approach, but that could change on a different machine or with different versions of Python and NumPy: you should test with data that are appropriate for your use-case.
As a fast approach you can use colections.Counter
which is the more pythonic way for getting the frequency of an iterable items :
>>> import numpy as np
>>> array_ = np.array([0,0,0,1,0,0,2,0,0,1,2,0])
>>> from collections import Counter
>>> c=Counter(array_)
>>> np.array(map(c.get,array_))
array([8, 8, 8, 2, 8, 8, 2, 8, 8, 2, 2, 8])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.