简体   繁体   English

在 python 中,我如何计算列中的唯一值以逐渐增加组内的行数

[英]How, in python, can I count unique values in a column for gradually increasing numbers of rows within groups

I am working in python on a pandas data frame and am trying to count unique values of a column within groups.我正在使用 python 处理 Pandas 数据框,并尝试计算组内列的唯一值。 My problem is that I need that count to represent steadily increasing numbers of rows within the groups and I also don't want NaNs to be counted.我的问题是,我需要该计数来表示组内稳定增加的行数,而且我也不希望计算 NaN。

Simplified, the data looks like this简化后,数据看起来像这样

ID    occup  
   
1       NaN
1         A
1       NaN
1       Nan
1         B
2         K
2       NaN
2         L
2         L
2         M 

The new column 'occupcount' should, within the groups defined by 'ID', count the number of unique values in 'occup' but, in the first row of each group I want the count to only consider the first row in the respective group.新列 'occupcount' 应该在由 'ID' 定义的组内计算 'occup' 中唯一值的数量,但是,在每个组的第一行中,我希望计数只考虑相应组中的第一行. In the second row, I want to count over the first two rows.在第二行,我想计算前两行。 In the fifth row, I want the count of unique values over all five rows within each group.在第五行中,我想要每个组内所有五行的唯一值计数。 It should look like this:它应该是这样的:

ID    occup    occupcount
   
 1      NaN             0
 1        A             1
 1      NaN             1
 1        B             2
 1        A             2
 2        K             1
 2      NaN             1
 2        L             2
 2        K             2
 2        M             3 

I tried to solve the task with something like我试图用类似的东西来解决这个任务

df['occupcount'] = (df.groupby(["ID"])['occup'].transform('nunique'))

But it only provides the total amount of unique values over all rows within each group, no gradual increase.但它只提供每个组内所有行的唯一值总数,没有逐渐增加。 Thanks in advance!提前致谢!

Idea is chain first duplicated values by both columns with not missing values for mask and then use GroupBy.cumsum :想法是首先将两列的重复值链接起来,并且不缺少掩码值,然后使用GroupBy.cumsum

df['occupcount'] = ((~df.duplicated(['ID','occup']) & df['occup'].notna())
                         .groupby(df['ID'])
                         .cumsum())
print (df)
   ID occup  occupcount
0   1   NaN           0
1   1     A           1
2   1   NaN           1
3   1     B           2
4   1     A           2
5   2     K           1
6   2   NaN           1
7   2     L           2
8   2     L           2
9   2     M           3

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何按组计算 python 中的累积唯一值? - How to count accumulative unique values by groups in python? 如何查询联接列上具有唯一值的行? - How can I query rows with unique values on a joined column? PySpark - 如何将 row_number 列添加到 DataFrame 并增加和唯一(分区内)数字 - PySpark - how to add row_number column to DataFrame with increasing and unique (within partition) numbers 如果指定列的值未严格增加,如何删除数据框的行? - How can i delete rows of a dataframe if the values of a specifing column are not strictly increasing? 如何使用 displot 在 python 中制作 seaborn plot ,其中我们计算一个字段中的唯一值而不是总行数? - How can I make a seaborn plot in python with displot where we count unique values in one field rather than the total number of rows? Python3 在列表中查找严格递增的“浮点值”组 - Python3 find groups of strictly increasing “ float values” within a list python - 如何使用python中另一个数据框中的列中的重复值为唯一行对数据框进行子集化? - How can I subset a data frame for unique rows using repeating values from a column in another data frame in python? 如何在pandas DataFrame值列中找到顺序(数字|增加|减少)的最高计数 - How to find the highest count of sequential (numbers|increasing|decreasing) in pandas DataFrame column of values 如何基于python中的多个条件计算列中的唯一行 - How to count unique rows in a column based on multiple conditions in python 计算组内的唯一值,然后将两个值转为类别 - Count unique values within groups and then pivot two into categories
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM