I have a dataframe 'df', with the following structure:
Input:
ID | Product | Price |
---|---|---|
1 | P1 | 10 |
2 | P1 | 11 |
3 | P2 | 12 |
4 | P2 | 12 |
5 | P2 | 15 |
Expected Output:
ID | Product | Price | Distinct_Running_Count |
---|---|---|---|
1 | P1 | 10 | 1 |
2 | P1 | 11 | 2 |
3 | P2 | 12 | 1 |
4 | P2 | 12 | 1 |
5 | P2 | 15 | 2 |
Problem:
I want to create a new column called 'Distinct_Running_Count', with the following logic:
Solutions Tried:
df['Distinct_Running_Count'] = df.groupby(['Product', 'Price']).cumcount() + 1
df['Distinct_Running_Count'] = df.groupby(['Product', 'Price']).transform('nunique')
Issue:
The above solution either provides running count or the total uniques counts but not what I expect
You can try to compare the row and next row in Price
column and calculate the cumsum
df['Distinct_Running_Count'] = (df.groupby(['Product'])['Price']
.transform(lambda col: col.ne(col.shift().fillna(col)).cumsum().add(1)))
print(df)
ID Product Price Distinct_Running_Count
0 1 P1 10 1
1 2 P1 11 2
2 3 P2 12 1
3 4 P2 12 1
4 5 P2 15 2
My answer uses a few steps. First, get the unique rows (based on Product and Price).
Then, use cumcount()
to create your desired column.
Finally, merge this dataframe with your original dataframe.
df_without_dup = df[~df[['Product', 'Price']].duplicated()][['Product', 'Price']]
df_without_dup['Distinct_Running_Count'] = df_without_dup.groupby(['Product']).cumcount() + 1
df = df.merge(df_without_dup, on=['Product', 'Price'], how='left')
df_without_dup =
Product Price Distinct_Running_Count
0 P1 12 1
1 P1 11 2
2 P2 12 1
4 P2 15 2
Output:
ID Product Price Distinct_Running_Count
0 1 P1 12 1
1 2 P1 11 2
2 3 P2 12 1
3 4 P2 12 1
4 5 P2 15 2
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.