遍歷 Pandas DataFrame，多列

Question

parsed_df

  student_name, course_id,  weight,
0   A   1   10
1   B   1   10
2   C   1   10
3   A   1   40
4   B   1   40
5   C   1   40
6   A   1   50
7   B   1   50
8   C   1   50
9   A   2   40
10  C   2   40
11  A   2   60
12  C   2   60
13  A   3   90
14  B   3   90
15  C   3   90
16  A   3   10
17  B   3   10
18  C   3   10

遍歷每個學生和 course_id 以總結每個學生每門課程的權重的最佳方法是什么？ 權重的返回值應該每個加起來為 100，否則應該返回錯誤。

例如：calculated_df = parsed_df.groupby(['student_name','course_id'])['weight'].sum()

Answer 1

只是將總和與 100 進行比較？

>>> df.groupby(['student_name', 'course_id'])['weight'].sum().eq(100).reset_index()

  student_name  course_id  weight
0            A          1    True
1            A          2    True
2            A          3    True
3            B          1    True
4            B          3    True
5            C          1    True
6            C          2    True
7            C          3    True

Answer 2

import pandas as pd

df = pd.DataFrame({
    'student_name': ['A', 'B',  'C', 'A', 'B', 'C', 'A', 'B', 'C', 'A', 'C', 'A', 'C', 'A', 'B', 'C', 'A', 'B', 'C'],
    'course_id': [1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3],
    'weight': [10, 10, 10, 40, 40, 40, 50, 50, 50, 40, 40, 60, 60, 90, 90, 90, 10, 10, 9]
})

df = df.groupby(by=['student_name', 'course_id']).agg({'weight': 'sum'})
df.reset_index(inplace=True)

print(df)
#  student_name  course_id  weight
#0            A          1     100
#1            A          2     100
#2            A          3     100
#3            B          1     100
#4            B          3     100
#5            C          1     100
#6            C          2     100
#7            C          3      99 -> Intentionally adjusted to raise Exception

#To check if there is weight, which is not 100
for i, row in df.iterrows():
    if row['weight'] != 100:
        raise Exception(f"Student {row['student_name']}\'s Course ID {row['course_id']}: {row['weight']}")

Exception: Student C's Course ID 3: 99

遍歷 Pandas DataFrame，多列

問題描述

2 個解決方案

解決方案1
0 2022-02-22 20:15:46

解決方案2
0 2022-02-22 20:27:58

遍歷 Pandas DataFrame，多列

問題描述

2 個解決方案

解決方案1 0 2022-02-22 20:15:46

解決方案2 0 2022-02-22 20:27:58

解決方案1
0 2022-02-22 20:15:46

解決方案2
0 2022-02-22 20:27:58