如何计算二维直方图中每个 bin 中每个唯一 ID 的出现次数（python 或 pandas）

Question

I have a csv file and I would like to create a 2d histogram where the value in each bin depends on the unique ID.我有一个 csv 文件，我想创建一个二维直方图，其中每个 bin 中的值取决于唯一 ID。 For example (see below), for the range 0<x<1 and 1<y<2, the value is 2 (A, B) not 3 (A, A, B) because A appears twice.例如（见下文），对于范围 0<x<1 和 1<y<2，值是 2 (A, B) 而不是 3 (A, A, B)，因为 A 出现了两次。 Thanks!谢谢！

ID ID	x X	y是的
A一个	0.5 0.5	1.4 1.4
A一个	0.6 0.6	1.6 1.6
A一个	1.2 1.2	2.2 2.2
B乙	0.7 0.7	1.7 1.7
C C	4.4 4.4	3.5 3.5
C C	3.1 3.1	3.7 3.7

Answer 1

A bin of i_x < x < j_x , i_y < y < j_y can be uniquely identified as the (i_x, i_y) ; i_x < x < j_x , i_y < y < j_y的 bin 可以唯一标识为(i_x, i_y) ； we can see that this tuple is unique for each bin.我们可以看到这个元组对于每个 bin 都是唯一的。 i_x and i_y are simply the floor value of x and y . i_x和i_y只是x和y的底值。 Like For row: (x, y) = (0.5, 1.4) bin is: 0 < 0.5 < 1 , 1 < 1.4 < 1.2 here i_x = 0 = floor(0.5) and i_y = 1 = floor(1.4) .就像对于行： (x, y) = (0.5, 1.4) bin 是： 0 < 0.5 < 1 , 1 < 1.4 < 1.2这里i_x = 0 = floor(0.5)和i_y = 1 = floor(1.4) 。

Approach:方法：

Find i_x and i_y for x and y columns.为 x 和 y 列查找i_x和i_y 。
Group the dataframe using key (i_x, i_y) and count unique IDs in each of the group.使用键(i_x, i_y)对 dataframe 进行分组，并计算每个组中的唯一IDs 。

Code:代码：

>>> df
  ID    x    y
0  A  0.5  1.4
1  A  0.6  1.6
2  A  1.2  2.2
3  B  0.7  1.7
4  C  4.4  3.5
5  C  3.1  3.7

df['bin_x'] = np.floor(df.x).astype(int)
df['bin_y'] = np.floor(df.y).astype(int)
df = (df.groupby(['bin_x', 'bin_y'], as_index = False)['ID']
        .agg({'cnt' : 'nunique'}))


>>> df
   bin_x  bin_y  cnt
0      0      1    2
1      1      2    1
2      3      3    1
3      4      3    1

If you are defining your histogram as numpy array of size (5, 5) then we can assign cnt values to that array and get the desired histogram.如果您将直方图定义为大小为 (5, 5) 的 numpy 数组，那么我们可以将cnt值分配给该数组并获得所需的直方图。

histogram = np.zeros((5, 5))
histogram[df.bin_x, df.bin_y] = df.cnt
>>> histogram
array([[0., 2., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 1., 0.]])

如何计算二维直方图中每个 bin 中每个唯一 ID 的出现次数（python 或 pandas）

问题描述

1 个解决方案

解决方案1
0 已采纳 2021-04-12 22:11:24

如何计算二维直方图中每个 bin 中每个唯一 ID 的出现次数（python 或 pandas）

问题描述

1 个解决方案

解决方案1 0 已采纳 2021-04-12 22:11:24

解决方案1
0 已采纳 2021-04-12 22:11:24