在 python 中，我如何计算列中的唯一值以逐渐增加组内的行数

Question

I am working in python on a pandas data frame and am trying to count unique values of a column within groups.我正在使用 python 处理 Pandas 数据框，并尝试计算组内列的唯一值。 My problem is that I need that count to represent steadily increasing numbers of rows within the groups and I also don't want NaNs to be counted.我的问题是，我需要该计数来表示组内稳定增加的行数，而且我也不希望计算 NaN。

Simplified, the data looks like this简化后，数据看起来像这样

ID    occup  
   
1       NaN
1         A
1       NaN
1       Nan
1         B
2         K
2       NaN
2         L
2         L
2         M

The new column 'occupcount' should, within the groups defined by 'ID', count the number of unique values in 'occup' but, in the first row of each group I want the count to only consider the first row in the respective group.新列 'occupcount' 应该在由 'ID' 定义的组内计算 'occup' 中唯一值的数量，但是，在每个组的第一行中，我希望计数只考虑相应组中的第一行. In the second row, I want to count over the first two rows.在第二行，我想计算前两行。 In the fifth row, I want the count of unique values over all five rows within each group.在第五行中，我想要每个组内所有五行的唯一值计数。 It should look like this:它应该是这样的：

ID    occup    occupcount
   
 1      NaN             0
 1        A             1
 1      NaN             1
 1        B             2
 1        A             2
 2        K             1
 2      NaN             1
 2        L             2
 2        K             2
 2        M             3

I tried to solve the task with something like我试图用类似的东西来解决这个任务

df['occupcount'] = (df.groupby(["ID"])['occup'].transform('nunique'))

But it only provides the total amount of unique values over all rows within each group, no gradual increase.但它只提供每个组内所有行的唯一值总数，没有逐渐增加。 Thanks in advance!提前致谢！

Answer 1

Idea is chain first duplicated values by both columns with not missing values for mask and then use GroupBy.cumsum :想法是首先将两列的重复值链接起来，并且不缺少掩码值，然后使用GroupBy.cumsum ：

df['occupcount'] = ((~df.duplicated(['ID','occup']) & df['occup'].notna())
                         .groupby(df['ID'])
                         .cumsum())
print (df)
   ID occup  occupcount
0   1   NaN           0
1   1     A           1
2   1   NaN           1
3   1     B           2
4   1     A           2
5   2     K           1
6   2   NaN           1
7   2     L           2
8   2     L           2
9   2     M           3

在 python 中，我如何计算列中的唯一值以逐渐增加组内的行数

问题描述

1 个解决方案

解决方案1
4 已采纳 2021-11-09 14:58:15

在 python 中，我如何计算列中的唯一值以逐渐增加组内的行数

问题描述

1 个解决方案

解决方案1 4 已采纳 2021-11-09 14:58:15

解决方案1
4 已采纳 2021-11-09 14:58:15