I wonder how to count accumulative unique values by groups in python?
Below is the dataframe example:
Group | Year | Type |
---|---|---|
A | 1998 | red |
A | 2002 | red |
A | 2005 | blue |
A | 2008 | blue |
A | 2009 | yello |
B | 1998 | red |
B | 2001 | red |
B | 2003 | red |
C | 1996 | red |
C | 2002 | orange |
C | 2008 | blue |
C | 2012 | yello |
I need to create a new column by Column "Group". The value of this new column should be the accumulative unique values of Column "Type", accumulating by Column "Year".
Below is the dataframe I want. For example: For group A and in Year 1998, the accumulative unique values of "Type" is 1. For group A and in Year 2005, the accumulative unique values of "Type" is 2. For group C and in Year 2012, the accumulative unique values of "Type" is 4.
| Group| Year| Type|Want|
|------|-----|-----|----|
|A|1998|red|1|
|A|2002|red|1|
|A|2005|blue|2|
|A|2008|blue|2|
|A|2009|yello|3|
|B|1998|red|1|
|B|2001|red|1|
|B|2003|red|1|
|C|1996|red|1|
|C|2002|orange|2|
|C|2008|blue|3|
|C|2012|yello|4|
One more thing about this dataframe: not all groups have values in the same years. For example, group A has values in year 1998,2002,2005, and 2008. group B has values in year 1998, 2001, and 2003.
I wonder how to address this problem. Your great help means a lot to me. Thanks!
Use custom lambda function with factorize
in GroupBy.transform
:
f = lambda x: pd.factorize(x)[0]
df['Want1'] = df.groupby('Group', sort=False)['Type'].transform(f) + 1
print (df)
Group Year Type Want1
0 A 1998 red 1
1 A 2002 red 1
2 A 2005 blue 2
3 A 2008 blue 2
4 A 2009 yello 3
5 B 1998 red 1
6 B 2001 red 1
7 B 2003 red 1
8 C 1996 red 1
9 C 2002 orange 2
10 C 2008 blue 3
11 C 2012 yello 4
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.