如何按组计算 python 中的累积唯一值？

Question

I wonder how to count accumulative unique values by groups in python?我想知道如何按组计算 python 中的累积唯一值？

Below is the dataframe example:下面是 dataframe 示例：

Group团体	Year年	Type类型
A一个	1998 1998	red红色的
A一个	2002 2002年	red红色的
A一个	2005 2005年	blue蓝色的
A一个	2008 2008年	blue蓝色的
A一个	2009 2009	yello黄
B乙	1998 1998	red红色的
B乙	2001 2001年	red红色的
B乙	2003 2003年	red红色的
C C	1996 1996	red红色的
C C	2002 2002年	orange橙
C C	2008 2008年	blue蓝色的
C C	2012 2012	yello黄

I need to create a new column by Column "Group".我需要按“组”列创建一个新列。 The value of this new column should be the accumulative unique values of Column "Type", accumulating by Column "Year".这个新列的值应该是列“类型”的累积唯一值，按列“年”累积。

Below is the dataframe I want.下面是我想要的dataframe。 For example: For group A and in Year 1998, the accumulative unique values of "Type" is 1. For group A and in Year 2005, the accumulative unique values of "Type" is 2. For group C and in Year 2012, the accumulative unique values of "Type" is 4.例如：对于 A 组，在 1998 年，“类型”的累积唯一值为 1。对于 A 组，在 2005 年，“类型”的累积唯一值为 2。对于 C 和 2012 年， “类型”的累积唯一值是 4。

| Group| Year| Type|Want|
|------|-----|-----|----|
|A|1998|red|1|
|A|2002|red|1|
|A|2005|blue|2|
|A|2008|blue|2|
|A|2009|yello|3|
|B|1998|red|1|
|B|2001|red|1|
|B|2003|red|1|
|C|1996|red|1|
|C|2002|orange|2|
|C|2008|blue|3|
|C|2012|yello|4|

One more thing about this dataframe: not all groups have values in the same years.关于此 dataframe 的另一件事：并非所有组在同一年份都有值。 For example, group A has values in year 1998,2002,2005, and 2008. group B has values in year 1998, 2001, and 2003.例如，A 组在 1998、2002、2005 和 2008 年有值。B 组在 1998、2001 和 2003 年有值。

I wonder how to address this problem.我想知道如何解决这个问题。 Your great help means a lot to me.您的大力帮助对我来说意义重大。 Thanks!谢谢！

Answer 1

Use custom lambda function with factorize in GroupBy.transform :在GroupBy.transform中使用自定义 lambda function 和factorize ：

f = lambda x: pd.factorize(x)[0]
df['Want1'] = df.groupby('Group', sort=False)['Type'].transform(f) + 1
print (df)
   Group  Year    Type  Want1
0      A  1998     red      1
1      A  2002     red      1
2      A  2005    blue      2
3      A  2008    blue      2
4      A  2009   yello      3
5      B  1998     red      1
6      B  2001     red      1
7      B  2003     red      1
8      C  1996     red      1
9      C  2002  orange      2
10     C  2008    blue      3
11     C  2012   yello      4

如何按组计算 python 中的累积唯一值？

问题描述

1 个解决方案

解决方案1
1 2022-08-15 13:48:41

如何按组计算 python 中的累积唯一值？

问题描述

1 个解决方案

解决方案1 1 2022-08-15 13:48:41

解决方案1
1 2022-08-15 13:48:41