简体   繁体   中英

Pandas DataFrame : Create a column based on values from different rows

I have a pandas dataframe which looks like this :

    Ref       Value
1   SKU1       A
2   SKU2       A           
3   SKU3       B
4   SKU2       A
5   SKU1       B
6   SKU3       C           

I would like to create a new column, conditioned on whether the values for a given Ref match or not. For instance, if for SKU1 both rows have the same values, display "good", if not display "bad" The dataframe will usually have 2 rows for each Ref, but sometimes will have more (in that case, "good" is when they all match with each other)

With the example above, this would be :

    Ref       Value    NewCol
1   SKU1       A        bad
2   SKU2       A        good   
3   SKU3       B        bad
4   SKU2       A        good  
5   SKU1       B        bad
6   SKU3       C        bad        

What would be the best way of implementing this ? In my example, Value can only be A, B or C, but Ref has thousands of different entries, which is why I am struggling

Many thanks in advance !

Let's try groupby().nunique() to check the number of values within a ref:

df['NewCol'] = np.where(df.groupby('Ref')['Value'].transform('nunique')==1, 
                        'good', 'bad')

Output:

    Ref Value NewCol
1  SKU1     A    bad
2  SKU2     A   good
3  SKU3     B    bad
4  SKU2     A   good
5  SKU1     B    bad
6  SKU3     C    bad

Update : per comment:

s = df['Ref'].map(df.groupby('Ref')['Value'].apply(set))

df['NewCol'] = np.select((s.str.len()==1, s.eq({'A','B'})),
                         ('good', 'average'), 'bad')

Output:

    Ref Value   NewCol
1  SKU1     A  average
2  SKU2     A     good
3  SKU3     B      bad
4  SKU2     A     good
5  SKU1     B  average
6  SKU3     C      bad

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM