How do I count the number of elements of each datapoint in a ndarray?
What I want to do is to run a OneHotEncoder on all the values that are present at least N times in my ndarray.
I also want to replace all the values that appears less than N times with another element that it doesn't appear in the array (let's call it new_value).
So for example I have :
import numpy as np
a = np.array([[[2], [2,3], [3,34]],
[[3], [4,5], [3,34]],
[[3], [2,3], [3,4] ]]])
with threshold N=2 I want something like:
b = [OneHotEncoder(a[:,[i]])[0] if count(a[:,[i]])>2
else OneHotEncoder(new_value) for i in range(a.shape(1)]
So only to understand the substitutions that I want, not considering the onehotencoder and using new_value=10 my array should look like:
a = np.array([[[10], [2,3], [3,34]],
[[3], [10], [3,34]],
[[3], [2,3], [10] ]]])
How about something like this?
First count the number of unqiue elements in an array:
>>> a=np.random.randint(0,5,(3,3))
>>> a
array([[0, 1, 4],
[0, 2, 4],
[2, 4, 0]])
>>> ua,uind=np.unique(a,return_inverse=True)
>>> count=np.bincount(uind)
>>> ua
array([0, 1, 2, 4])
>>> count
array([3, 1, 2, 3])
From the ua
and count
arrays it shows that 0 shows up 3 times, 1 shows up 1 time, and so on.
import numpy as np
def mask_fewest(arr,thresh,replace):
ua,uind=np.unique(arr,return_inverse=True)
count=np.bincount(uind)
#Here ua has all of the unique elements, count will have the number of times
#each appears.
#@Jamie's suggestion to make the rep_mask faster.
rep_mask = np.in1d(uind, np.where(count < thresh))
#Find which elements do not appear at least `thresh` times and create a mask
arr.flat[rep_mask]=replace
#Replace elements based on above mask.
return arr
>>> a=np.random.randint(2,8,(4,4))
[[6 7 7 3]
[7 5 4 3]
[3 5 2 3]
[3 3 7 7]]
>>> mask_fewest(a,5,50)
[[10 7 7 3]
[ 7 5 10 3]
[ 3 5 10 3]
[ 3 3 7 7]]
For the above example: Let me know if you intended a 2D array or 3D array.
>>> a
[[[2] [2, 3] [3, 34]]
[[3] [4, 5] [3, 34]]
[[3] [2, 3] [3, 4]]]
>>> mask_fewest(a,2,10)
[[10 [2, 3] [3, 34]]
[[3] 10 [3, 34]]
[[3] [2, 3] 10]]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.