[英]group by and aggregate in pandas
code_presentation code_module score id_student id_assessment date_submitted
0 2013J AAA 78.0 11391 1752 18
1 2013J AAA 70.0 11391 1800 22
2 2013J AAA 72.0 31604 1752 17
3 2013J AAA 69.0 31604 1800 26
.....
我需要计算提交的天数以及如何正确分组以获得以下结果:
id_student id_assessment date_submitted
11391 1752 1
1800 1
31604 1752 1
1800 1
... ETC
我尝试:
analasys_grouped = analasys.groupby ( 'id_student', as_index = False)\
.agg({'id_assessment':'count', 'date_submitted': 'count'})
analasys_grouped
但它工作不正常
如果我对您的理解正确,您想对按id_assessment
分组的id_student
应用value_counts()
。 尝试:
assessment_count_per_student = df.groupby('id_student')['id_assessment'].value_counts()
print(assessment_count_per_student)
id_student id_assessment
11391 1752 1
1800 1
31604 1752 1
1800 1
Name: id_assessment, dtype: int64
您需要将id_assessment
传递到groupby
语句中。
df.groupby(['id_student', 'id_assessment'])['date_submitted'].count()
id_student id_assessment
11391 1752 1
1800 1
31604 1752 1
1800 1
在您的尝试中,您仅按id_student
分组,然后计算提交的评估和日期。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.