简体   繁体   English

python:按某些id计算事件

[英]python: count events by certain id

I have the following array[][]: 我有以下数组[] []:

import numpy as np

data = np.array([
    [20,  0,  1],
    [22,  0,  1],
    [31,  0,  0],
    [49,  1,  0],
    [96,  1,  0],
    [57,  2,  1],
    [45,  3,  0],
    [12,  3,  0],
    [14,  3,  1],
    [33,  4,  1],
    [34,  4,  1],
    [15,  4,  1]
])

Lets call the columns: a, b, c by the above order where b is id . 让我们通过上面的顺序调用列: a, b, c ,其中b is id I want to count the number of 1's in the c column by the id's in column b this will result the following 2 columns array (1st column is unique(b) and 2nd column is counted 1's from c per that id): 我想通过列b的id计算c列中1的数量,这将导致以下2列数组(第1列是唯一的(b),第2列从c每个id计算1):

data = np.array([
    [4,  3],
    [0,  2],
    [2,  1],
    [3,  1],
    [1,  0]
])

You can also see it is sorted by the counted 1's in the c column 您还可以看到它按c列中的计数1排序

My idea to solve this problem was to create a dictionery: { id1:counted 1's id2:counted 1's ... } by the ids in the b column and iterate over the array and count the number of 1's per id and put it as value per key in the dictionary, then create an array out of the result and sort it by the 2nd column. 我解决这个问题的想法是通过b列中的id创建一个dictionery: { id1:counted 1's id2:counted 1's ... }并迭代数组并计算每个id的1的数量并将其作为值在字典中的每个键,然后从结果中创建一个数组并按第二列排序。

Is there any pythonic easy and better way to do so ? 有没有pythonic容易和更好的方法这样做?

another case scenario is where i want to add all the integers in the c column per id, so for: 另一种情况是我想在每个id的c列中添加所有整数,所以对于:

data = np.array([
    [20,  0,  2],
    [22,  0,  1],
    [31,  0,  0],
    [49,  1,  0],
    [96,  1,  0],
    [57,  2,  1],
    [45,  3,  0],
    [12,  3,  5],
    [14,  3,  1],
    [33,  4,  1],
    [34,  4,  3],
    [15,  4,  4]
])

I will get 我会得到

data = np.array([
    [4,  8],
    [3,  6],
    [0,  3],
    [2,  1],
    [1,  0]
])

You can use np.bincount - 你可以使用np.bincount -

count = np.bincount(data[:,1],data[:,2]==1)
out = np.column_stack((np.unique(data[:,1]),count))

If you need it in descending order of count, we need to add two more lines of code - 如果按照计数的降序需要它,我们需要再添加两行代码 -

sidx = count.argsort()[::-1]
out = np.column_stack((sidx,count[sidx]))

Alternatively, if you need it in descending order of count and also keep the order, use argsort with 'mergesort' , like so - 另外,如果你需要它在递减计数的顺序,还可以保持这个顺序,使用argsort'mergesort' ,像这样-

sidx = (-count).argsort(kind='mergesort')
out = np.column_stack((sidx,count[sidx]))

Sample run - 样品运行 -

Input array : 输入数组:

In [36]: data
Out[36]: 
array([[20,  0,  1],
       [22,  0,  1],
       [31,  0,  0],
       [49,  1,  0],
       [96,  1,  0],
       [57,  2,  1],
       [45,  3,  0],
       [12,  3,  0],
       [14,  3,  1],
       [33,  4,  1],
       [34,  4,  1],
       [15,  4,  1]])

Part 1 : 第1部分 :

In [37]: count = np.bincount(data[:,1],data[:,2]==1)
    ...: out = np.column_stack((np.unique(data[:,1]),count))
    ...: 

In [38]: out
Out[38]: 
array([[ 0.,  2.],
       [ 1.,  0.],
       [ 2.,  1.],
       [ 3.,  1.],
       [ 4.,  3.]])

Part 2 : 第2部分 :

In [39]: sidx = count.argsort()[::-1]
    ...: out = np.column_stack((sidx,count[sidx]))
    ...: 

In [40]: out
Out[40]: 
array([[ 4.,  3.],
       [ 0.,  2.],
       [ 3.,  1.],
       [ 2.,  1.],
       [ 1.,  0.]])

Part 3 : 第3部分:

In [48]: sidx = (-count).argsort(kind='mergesort')

In [49]: np.column_stack((sidx,count[sidx]))
Out[49]: 
array([[ 4.,  3.],
       [ 0.,  2.],
       [ 2.,  1.],
       [ 3.,  1.],
       [ 1.,  0.]])

To add all the integers in the c column per id, simply skip that check against 1 - 要在每个id的c列中添加所有整数,只需跳过对1检查 -

count = np.bincount(data[:,1],data[:,2])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM