简体   繁体   中英

Assigning value to pandas dataframe values for unique values in another column

I have the following dataframe:

df = pd.DataFrame({"marks": [40, 60, 90, 20, 100, 10, 30, 70 ], "students": 
                   ["Jack", "Jack", "Jack", "Jack", "John", "John", "John", "John"]}
        )

   marks  students
0  40      Jack
1  60      Jack
2  90      Jack
3  20      Jack
4  100     John
5  10      John
6  30      John
7  70      John

I am attempting to assign a student's average to his marks below 40 (the average will include the lowest mark).

I am aware of assigning a mark based on the < 40 condition (in this case I assigned the lowest mark of the df to all marks below 40), like so:

df.loc[df["marks"] < 40, "marks"] = df["marks"].min()

But I am confused on how to potentially apply a lambda function on unique student names. Any help would be appreciated.

You can combine a groupby and where :

df['corrected_marks'] = df['marks'].where(df['marks']>=40,
                                          (df.groupby('students')
                                             ['marks']
                                             .transform('mean'))
                                          )

output:

   marks students  corrected_marks
0     40     Jack             40.0
1     60     Jack             60.0
2     90     Jack             90.0
3     20     Jack             52.5
4    100     John            100.0
5     10     John             52.5
6     30     John             52.5
7     70     John             70.0

Try with np.where

df['marks'] = np.where(df['marks'] <40, 
                       df.groupby('students')['marks'].transform('mean'), 
                       df['marks'])
    
df
Out[18]: 
   marks students
0   40.0     Jack
1   60.0     Jack
2   90.0     Jack
3   52.5     Jack
4  100.0     John
5   52.5     John
6   52.5     John
7   70.0     John

@mozway answer is correct. You could do it in two steps:

df['mean'] = df.groupby('students')['marks'].transform('mean')

df['final_marks'] = df.apply(lambda x: x['mean'] if (x['marks'] < 40) else x['marks'], axis=1)

print(df)

output:

   marks students  mean  final_marks
0     40     Jack  52.5         40.0
1     60     Jack  52.5         60.0
2     90     Jack  52.5         90.0
3     20     Jack  52.5         52.5
4    100     John  52.5        100.0
5     10     John  52.5         52.5
6     30     John  52.5         52.5
7     70     John  52.5         70.0

To apply your logics on unique student names, you can group by student names by .groupby() and get the average of each student (each group) by transform() on 'mean' . Then, you can assign the mean values to marks using the same mechanism in the code you tried, like below:

df.loc[df["marks"] < 40, "marks"] = df.groupby('students')['marks'].transform('mean')

Result:

print(df)

   marks students
0   40.0     Jack
1   60.0     Jack
2   90.0     Jack
3   52.5     Jack
4  100.0     John
5   52.5     John
6   52.5     John
7   70.0     John

If you actually want to assign the lowest mark (instead of 'mean' mark) of each student to all marks below 40 for that student, you can use transform() on 'min' instead:

df.loc[df["marks"] < 40, "marks"] = df.groupby('students')['marks'].transform('min')

Result:

print(df)

   marks students
0     40     Jack
1     60     Jack
2     90     Jack
3     20     Jack
4    100     John
5     10     John
6     10     John
7     70     John

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM