简体   繁体   中英

Change the value of a pandas dataframe column based on a condition ,also depending on other columns of the dataframe

    Category              DishName   Id 
0   a                     Pistachio  621f4884e48bc60012364b13   
1   a                     Pistachio  621f4884e48bc60012364b13   
2   a                     Pistachio  621f4884e48bc60012364b13   
3   a                     achar      621f4884e48bc60012364b13   
4   b                     achar      621f4884e48bc60012364b13   
5   b                     achar      621f4884e48bc60012364b13   
6   a                     chicken    621f4884e48bc60012364b13   
7   b                     chicken    621f4884e48bc60012364b13   
8   c                     chicken    621f4884e48bc60012364b13 

My dataframe has 3 columns category, dishname and id. Considering the id and the dishname I have to assign category.

Assign "a" if all the category values are "a"

Assign "b" if category values are "a","b"

Assign "c" if category values are "a","b","c"

Expected output is

    Category              DishName   Id 
0   a                     Pistachio  621f4884e48bc60012364b13   
1   a                     Pistachio  621f4884e48bc60012364b13   
2   a                     Pistachio  621f4884e48bc60012364b13   
3   b                     achar      621f4884e48bc60012364b13   
4   b                     achar      621f4884e48bc60012364b13   
5   b                     achar      621f4884e48bc60012364b13   
6   c                     chicken    621f4884e48bc60012364b13   
7   c                     chicken    621f4884e48bc60012364b13   
8   c                     chicken    621f4884e48bc60012364b13 

You can transform to ordered Categorical and get the max per group:

df['Category'] = (pd
                  .Series(pd.Categorical(df['Category'],
                                         categories=['a', 'b', 'c'], ordered=True),
                          index=df.index)
                  .groupby(df['DishName'])
                  .transform('max')
                  )

NB. You wouldn't need the categorical for simply a, b, c , as those three are lexicographically sorted, but I imagine a real life case wouldn't necessarily be. As example low < medium < high is logically but not lexicographically sorted.

Output:

  Category   DishName                        Id
0        a  Pistachio  621f4884e48bc60012364b13
1        a  Pistachio  621f4884e48bc60012364b13
2        a  Pistachio  621f4884e48bc60012364b13
3        b      achar  621f4884e48bc60012364b13
4        b      achar  621f4884e48bc60012364b13
5        b      achar  621f4884e48bc60012364b13
6        c    chicken  621f4884e48bc60012364b13
7        c    chicken  621f4884e48bc60012364b13
8        c    chicken  621f4884e48bc60012364b13
df['Category'] = df.groupby('DishName')['Category'].transform('max')

Output:

  Category   DishName                        Id
0        a  Pistachio  621f4884e48bc60012364b13
1        a  Pistachio  621f4884e48bc60012364b13
2        a  Pistachio  621f4884e48bc60012364b13
3        b      achar  621f4884e48bc60012364b13
4        b      achar  621f4884e48bc60012364b13
5        b      achar  621f4884e48bc60012364b13
6        c    chicken  621f4884e48bc60012364b13
7        c    chicken  621f4884e48bc60012364b13
8        c    chicken  621f4884e48bc60012364b13

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM