I have the following array[][]:
import numpy as np
data = np.array([
[20, 0, 1],
[22, 0, 1],
[31, 0, 0],
[49, 1, 0],
[96, 1, 0],
[57, 2, 1],
[45, 3, 0],
[12, 3, 0],
[14, 3, 1],
[33, 4, 1],
[34, 4, 1],
[15, 4, 1]
])
Lets call the columns: a, b, c
by the above order where b is id
. I want to count the number of 1's in the c
column by the id's in column b
this will result the following 2 columns array (1st column is unique(b) and 2nd column is counted 1's from c
per that id):
data = np.array([
[4, 3],
[0, 2],
[2, 1],
[3, 1],
[1, 0]
])
You can also see it is sorted by the counted 1's in the c
column
My idea to solve this problem was to create a dictionery: { id1:counted 1's id2:counted 1's ... }
by the ids in the b column and iterate over the array and count the number of 1's per id and put it as value per key in the dictionary, then create an array out of the result and sort it by the 2nd column.
Is there any pythonic easy and better way to do so ?
another case scenario is where i want to add all the integers in the c column per id, so for:
data = np.array([
[20, 0, 2],
[22, 0, 1],
[31, 0, 0],
[49, 1, 0],
[96, 1, 0],
[57, 2, 1],
[45, 3, 0],
[12, 3, 5],
[14, 3, 1],
[33, 4, 1],
[34, 4, 3],
[15, 4, 4]
])
I will get
data = np.array([
[4, 8],
[3, 6],
[0, 3],
[2, 1],
[1, 0]
])
You can use np.bincount
-
count = np.bincount(data[:,1],data[:,2]==1)
out = np.column_stack((np.unique(data[:,1]),count))
If you need it in descending order of count, we need to add two more lines of code -
sidx = count.argsort()[::-1]
out = np.column_stack((sidx,count[sidx]))
Alternatively, if you need it in descending order of count and also keep the order, use argsort
with 'mergesort'
, like so -
sidx = (-count).argsort(kind='mergesort')
out = np.column_stack((sidx,count[sidx]))
Sample run -
Input array :
In [36]: data
Out[36]:
array([[20, 0, 1],
[22, 0, 1],
[31, 0, 0],
[49, 1, 0],
[96, 1, 0],
[57, 2, 1],
[45, 3, 0],
[12, 3, 0],
[14, 3, 1],
[33, 4, 1],
[34, 4, 1],
[15, 4, 1]])
Part 1 :
In [37]: count = np.bincount(data[:,1],data[:,2]==1)
...: out = np.column_stack((np.unique(data[:,1]),count))
...:
In [38]: out
Out[38]:
array([[ 0., 2.],
[ 1., 0.],
[ 2., 1.],
[ 3., 1.],
[ 4., 3.]])
Part 2 :
In [39]: sidx = count.argsort()[::-1]
...: out = np.column_stack((sidx,count[sidx]))
...:
In [40]: out
Out[40]:
array([[ 4., 3.],
[ 0., 2.],
[ 3., 1.],
[ 2., 1.],
[ 1., 0.]])
Part 3 :
In [48]: sidx = (-count).argsort(kind='mergesort')
In [49]: np.column_stack((sidx,count[sidx]))
Out[49]:
array([[ 4., 3.],
[ 0., 2.],
[ 2., 1.],
[ 3., 1.],
[ 1., 0.]])
To add all the integers in the c column per id, simply skip that check against 1
-
count = np.bincount(data[:,1],data[:,2])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.