I have a pandas dataframe which looks like this :
Ref Value
1 SKU1 A
2 SKU2 A
3 SKU3 B
4 SKU2 A
5 SKU1 B
6 SKU3 C
I would like to create a new column, conditioned on whether the values for a given Ref match or not. For instance, if for SKU1 both rows have the same values, display "good", if not display "bad" The dataframe will usually have 2 rows for each Ref, but sometimes will have more (in that case, "good" is when they all match with each other)
With the example above, this would be :
Ref Value NewCol
1 SKU1 A bad
2 SKU2 A good
3 SKU3 B bad
4 SKU2 A good
5 SKU1 B bad
6 SKU3 C bad
What would be the best way of implementing this ? In my example, Value can only be A, B or C, but Ref has thousands of different entries, which is why I am struggling
Many thanks in advance !
Let's try groupby().nunique()
to check the number of values within a ref:
df['NewCol'] = np.where(df.groupby('Ref')['Value'].transform('nunique')==1,
'good', 'bad')
Output:
Ref Value NewCol
1 SKU1 A bad
2 SKU2 A good
3 SKU3 B bad
4 SKU2 A good
5 SKU1 B bad
6 SKU3 C bad
Update : per comment:
s = df['Ref'].map(df.groupby('Ref')['Value'].apply(set))
df['NewCol'] = np.select((s.str.len()==1, s.eq({'A','B'})),
('good', 'average'), 'bad')
Output:
Ref Value NewCol
1 SKU1 A average
2 SKU2 A good
3 SKU3 B bad
4 SKU2 A good
5 SKU1 B average
6 SKU3 C bad
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.