I have an assignment that asks me to compute whether the score difference in several classes of students is higher than 0.2 based on picking one or more reference students in every classes that bear the reference score.
Here is the example data frame
df = pd.DataFrame({'student' : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'class' : [1, 1, 1, 2, 2, 2, 2, 2, 2, 2],
'type' : ['top', 'top', 'low', 'mid', 'mid', 'mid', 'low', 'low', 'low', 'low'],
'score' : [1, .8, .3, .7, .7, .6, .1, .2, .1, .1]})
df
The algorithm should contain the following rules
So the final outcome will be
df2 = pd.DataFrame({'student' : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
'class' : [1, 1, 1, 2, 2, 2, 2, 2, 2, 2],
'type' : ['top', 'top', 'low', 'mid', 'mid', 'mid', 'low', 'low', 'low', 'low'],
'score' : [1, .8, .3, .7, .6, .6, .1, .2, .1, .1],
'outcome' : ['no', 'ref', 'yes', 'no', 'ref', 'ref', 'yes', 'yes', 'yes', 'yes']})
df2
I have some basic knowledge of pandas but I think this problem is too complicated for me. Do you have any ideas on how to go about it?
def final_output(df):
# groups class & type
groups = df2.groupby(['class', 'type'])
# cl will have key as 'Class' & value as 'reference student score'
cl = {}
for name,group in groups:
if 'top' in name[1]:
cl[name[0]] = group['score'].min()
elif 'mid' in name[1]:
cl[name[0]] = group['score'].min()
# Assigning reference student score to their respective class students
df['refer_score'] = df['class'].apply(lambda x: cl[x])
# difference being reference student score minus actual score of the student
df['diff'] = df.apply(lambda x: abs(x['refer_score'] - x['score']), axis=1)
df['final_outcome'] = df['diff'].apply(lambda x: 'yes' if x > 0.2 else 'ref' if x == 0.0 else 'no')
return df
output = final_output(df2)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.