使用熊貓數據框中的不同參考行比較數值

Question

我有一項作業，要求我根據在每個具有參考分數的課程中選拔一名或多名參考學生，計算幾班學生的分數差異是否高於0.2。

這是示例數據幀

df = pd.DataFrame({'student' : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
     'class' : [1, 1, 1, 2, 2, 2, 2, 2, 2, 2],
     'type' : ['top', 'top', 'low', 'mid', 'mid', 'mid', 'low', 'low', 'low', 'low'],
     'score' : [1, .8, .3, .7, .7, .6, .1, .2, .1, .1]})
df

該算法應包含以下規則

選擇參考學生時，首先要優先考慮“頂尖”學生，然后是“中等”表現學生，並在有多個候選人的情況下檢查誰更接近基數0.5（在“第1類”的示例中，我們有兩個“頂尖”學生，但我們選擇0.8接近0.5的第二個，在“ 2級”中，我們選擇0.6的“中級”學生比0.7的學生更接近0.5，並且我們沒有“頂級”學生）
計算每個非參照學生得分與參照的差異，如果差異> 0.2，則寫'yes'，如果差異<= 0.2，則寫'no'。

所以最終結果將是

df2 = pd.DataFrame({'student' : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
     'class' : [1, 1, 1, 2, 2, 2, 2, 2, 2, 2],
     'type' : ['top', 'top', 'low', 'mid', 'mid', 'mid', 'low', 'low', 'low', 'low'],
     'score' : [1, .8, .3, .7, .6, .6, .1, .2, .1, .1],
     'outcome' : ['no', 'ref', 'yes', 'no', 'ref', 'ref', 'yes', 'yes', 'yes', 'yes']})
df2

我對熊貓有一些基本的了解，但我認為這個問題對我來說太復雜了。 您對此有任何想法嗎？

Answer 1

def final_output(df):
    # groups class & type
    groups = df2.groupby(['class', 'type'])

    # cl will have key as 'Class' & value as 'reference student score' 
    cl = {}
    for name,group in groups:
        if 'top' in name[1]:
            cl[name[0]] = group['score'].min()
        elif 'mid' in name[1]:
            cl[name[0]] = group['score'].min()

    # Assigning reference student score to their respective class students
    df['refer_score'] = df['class'].apply(lambda x: cl[x])
    # difference being reference student score minus actual score of the student
    df['diff'] = df.apply(lambda x: abs(x['refer_score'] - x['score']), axis=1)

    df['final_outcome'] = df['diff'].apply(lambda x: 'yes' if x > 0.2 else 'ref' if x == 0.0 else 'no')
    return df

output = final_output(df2)

使用熊貓數據框中的不同參考行比較數值

問題描述

1 個解決方案

解決方案1
1 2018-08-09 12:55:54

使用熊貓數據框中的不同參考行比較數值

問題描述

1 個解決方案

解決方案1 1 2018-08-09 12:55:54

解決方案1
1 2018-08-09 12:55:54