I have the following dataframe:
df = pd.DataFrame({"marks": [40, 60, 90, 20, 100, 10, 30, 70 ], "students":
["Jack", "Jack", "Jack", "Jack", "John", "John", "John", "John"]}
)
marks students
0 40 Jack
1 60 Jack
2 90 Jack
3 20 Jack
4 100 John
5 10 John
6 30 John
7 70 John
I am attempting to assign a student's average to his marks below 40 (the average will include the lowest mark).
I am aware of assigning a mark based on the < 40
condition (in this case I assigned the lowest mark of the df to all marks below 40), like so:
df.loc[df["marks"] < 40, "marks"] = df["marks"].min()
But I am confused on how to potentially apply a lambda function on unique student
names. Any help would be appreciated.
You can combine a groupby
and where
:
df['corrected_marks'] = df['marks'].where(df['marks']>=40,
(df.groupby('students')
['marks']
.transform('mean'))
)
output:
marks students corrected_marks
0 40 Jack 40.0
1 60 Jack 60.0
2 90 Jack 90.0
3 20 Jack 52.5
4 100 John 100.0
5 10 John 52.5
6 30 John 52.5
7 70 John 70.0
Try with np.where
df['marks'] = np.where(df['marks'] <40,
df.groupby('students')['marks'].transform('mean'),
df['marks'])
df
Out[18]:
marks students
0 40.0 Jack
1 60.0 Jack
2 90.0 Jack
3 52.5 Jack
4 100.0 John
5 52.5 John
6 52.5 John
7 70.0 John
@mozway answer is correct. You could do it in two steps:
df['mean'] = df.groupby('students')['marks'].transform('mean')
df['final_marks'] = df.apply(lambda x: x['mean'] if (x['marks'] < 40) else x['marks'], axis=1)
print(df)
output:
marks students mean final_marks
0 40 Jack 52.5 40.0
1 60 Jack 52.5 60.0
2 90 Jack 52.5 90.0
3 20 Jack 52.5 52.5
4 100 John 52.5 100.0
5 10 John 52.5 52.5
6 30 John 52.5 52.5
7 70 John 52.5 70.0
To apply your logics on unique student
names, you can group by student
names by .groupby()
and get the average of each student (each group) by transform()
on 'mean'
. Then, you can assign the mean values to marks
using the same mechanism in the code you tried, like below:
df.loc[df["marks"] < 40, "marks"] = df.groupby('students')['marks'].transform('mean')
Result:
print(df)
marks students
0 40.0 Jack
1 60.0 Jack
2 90.0 Jack
3 52.5 Jack
4 100.0 John
5 52.5 John
6 52.5 John
7 70.0 John
If you actually want to assign the lowest mark (instead of 'mean' mark) of each student
to all marks below 40 for that student, you can use transform()
on 'min'
instead:
df.loc[df["marks"] < 40, "marks"] = df.groupby('students')['marks'].transform('min')
Result:
print(df)
marks students
0 40 Jack
1 60 Jack
2 90 Jack
3 20 Jack
4 100 John
5 10 John
6 10 John
7 70 John
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.