[英]group by and aggregate in pandas
code_presentation code_module score id_student id_assessment date_submitted
0 2013J AAA 78.0 11391 1752 18
1 2013J AAA 70.0 11391 1800 22
2 2013J AAA 72.0 31604 1752 17
3 2013J AAA 69.0 31604 1800 26
.....
I need to count submitted days and How to groupby it right ti get a result such as:我需要计算提交的天数以及如何正确分组以获得以下结果:
id_student id_assessment date_submitted
11391 1752 1
1800 1
31604 1752 1
1800 1
... etc ... ETC
I try:我尝试:
analasys_grouped = analasys.groupby ( 'id_student', as_index = False)\
.agg({'id_assessment':'count', 'date_submitted': 'count'})
analasys_grouped
but it is not working right但它工作不正常
If I understand you correctly, you want to apply value_counts()
on id_assessment
grouped by id_student
.如果我对您的理解正确,您想对按
id_assessment
分组的id_student
应用value_counts()
。 Try:尝试:
assessment_count_per_student = df.groupby('id_student')['id_assessment'].value_counts()
print(assessment_count_per_student)
id_student id_assessment
11391 1752 1
1800 1
31604 1752 1
1800 1
Name: id_assessment, dtype: int64
you need to pass id_assessment
into the groupby
statement.您需要将
id_assessment
传递到groupby
语句中。
df.groupby(['id_student', 'id_assessment'])['date_submitted'].count()
id_student id_assessment
11391 1752 1
1800 1
31604 1752 1
1800 1
in your attempt, you're only grouping by id_student
then counting the assesment and date submitted.在您的尝试中,您仅按
id_student
分组,然后计算提交的评估和日期。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.