[英]group by and aggregate in pandas
code_presentation code_module score id_student id_assessment date_submitted
0 2013J AAA 78.0 11391 1752 18
1 2013J AAA 70.0 11391 1800 22
2 2013J AAA 72.0 31604 1752 17
3 2013J AAA 69.0 31604 1800 26
.....
我需要計算提交的天數以及如何正確分組以獲得以下結果:
id_student id_assessment date_submitted
11391 1752 1
1800 1
31604 1752 1
1800 1
... ETC
我嘗試:
analasys_grouped = analasys.groupby ( 'id_student', as_index = False)\
.agg({'id_assessment':'count', 'date_submitted': 'count'})
analasys_grouped
但它工作不正常
如果我對您的理解正確,您想對按id_assessment
分組的id_student
應用value_counts()
。 嘗試:
assessment_count_per_student = df.groupby('id_student')['id_assessment'].value_counts()
print(assessment_count_per_student)
id_student id_assessment
11391 1752 1
1800 1
31604 1752 1
1800 1
Name: id_assessment, dtype: int64
您需要將id_assessment
傳遞到groupby
語句中。
df.groupby(['id_student', 'id_assessment'])['date_submitted'].count()
id_student id_assessment
11391 1752 1
1800 1
31604 1752 1
1800 1
在您的嘗試中,您僅按id_student
分組,然后計算提交的評估和日期。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.