Compare numerical values using different reference rows in pandas data frame

Question

I have an assignment that asks me to compute whether the score difference in several classes of students is higher than 0.2 based on picking one or more reference students in every classes that bear the reference score.

Here is the example data frame

df = pd.DataFrame({'student' : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
     'class' : [1, 1, 1, 2, 2, 2, 2, 2, 2, 2],
     'type' : ['top', 'top', 'low', 'mid', 'mid', 'mid', 'low', 'low', 'low', 'low'],
     'score' : [1, .8, .3, .7, .7, .6, .1, .2, .1, .1]})
df

The algorithm should contain the following rules

Pick the reference student by first giving priority to 'top' and then 'mid' performing students and checking who's closer to the base 0.5 in case of multiple candidates (In the example in 'Class 1' we have two 'top' students but we pick the second with 0.8 which is closer to 0.5 and in 'Class 2' we pick both 'mid' students with 0.6 which is closer to 0.5 than the student with 0.7 and we don't have any 'top' students)
Calculate the difference of every non-reference student score with the reference and write 'yes' in case of a difference >0.2 or 'no' in case of a difference <=0.2.

So the final outcome will be

df2 = pd.DataFrame({'student' : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
     'class' : [1, 1, 1, 2, 2, 2, 2, 2, 2, 2],
     'type' : ['top', 'top', 'low', 'mid', 'mid', 'mid', 'low', 'low', 'low', 'low'],
     'score' : [1, .8, .3, .7, .6, .6, .1, .2, .1, .1],
     'outcome' : ['no', 'ref', 'yes', 'no', 'ref', 'ref', 'yes', 'yes', 'yes', 'yes']})
df2

I have some basic knowledge of pandas but I think this problem is too complicated for me. Do you have any ideas on how to go about it?

Answer 1

def final_output(df):
    # groups class & type
    groups = df2.groupby(['class', 'type'])

    # cl will have key as 'Class' & value as 'reference student score' 
    cl = {}
    for name,group in groups:
        if 'top' in name[1]:
            cl[name[0]] = group['score'].min()
        elif 'mid' in name[1]:
            cl[name[0]] = group['score'].min()

    # Assigning reference student score to their respective class students
    df['refer_score'] = df['class'].apply(lambda x: cl[x])
    # difference being reference student score minus actual score of the student
    df['diff'] = df.apply(lambda x: abs(x['refer_score'] - x['score']), axis=1)

    df['final_outcome'] = df['diff'].apply(lambda x: 'yes' if x > 0.2 else 'ref' if x == 0.0 else 'no')
    return df

output = final_output(df2)

Compare numerical values using different reference rows in pandas data frame

Question

1 answers

solution1
1 2018-08-09 12:55:54

Compare numerical values using different reference rows in pandas data frame

Question

1 answers

solution1 1 2018-08-09 12:55:54

solution1
1 2018-08-09 12:55:54