简体   繁体   English

Pandas groupby 并获取 dataframe 中多列的唯一值

[英]Pandas groupby and get nunique of multiple columns in a dataframe

I have a dataframe like as below我有一个 dataframe,如下所示

stu_id,Mat_grade,sci_grade,eng_grade
1,A,C,A
1,A,C,A
1,B,C,A
1,C,C,A
2,D,B,B
2,D,C,B
2,D,D,C
2,D,A,C

tf = pd.read_clipboard(sep=',')

My objective is to我的目标是

a) Find out how many different unique grades that a student got under Mat_grade , sci_grade and eng_grade a) 找出学生在Mat_gradesci_gradeeng_grade下获得了多少个不同的唯一成绩

So, I tried the below所以,我尝试了以下

tf['mat_cnt'] = tf.groupby(['stu_id'])['Mat_grade'].nunique()
tf['sci_cnt'] = tf.groupby(['stu_id'])['sci_grade'].nunique()
tf['eng_cnt'] = tf.groupby(['stu_id'])['eng_grade'].nunique() 

But this doesn't provide the expected output. Since, I have more than 100K unique ids, any efficient and elegant solution is really helpful但这并没有提供预期的 output。因为,我有超过 100K 个唯一 ID,任何高效和优雅的解决方案都非常有帮助

I expect my output to be like as below我希望我的 output 如下所示

在此处输入图像描述

You can specify columns names in list and for column cols call DataFrameGroupBy.nunique with rename :您可以在列表中指定列名称,并为列cols调用DataFrameGroupBy.nunique rename

cols = ['Mat_grade','sci_grade', 'eng_grade']
new = ['mat_cnt','sci_cnt','eng_cnt']
d = dict(zip(cols, new))
df = tf.groupby(['stu_id'], as_index=False)[cols].nunique().rename(columns=d)
print (df)
   stu_id  mat_cnt  sci_cnt  eng_cnt
0       1        3        1        1
1       2        1        4        2

Another idea is used named aggregation:另一个想法是使用命名聚合:

cols = ['Mat_grade','sci_grade', 'eng_grade']
new = ['mat_cnt','sci_cnt','eng_cnt']
d = {v: (k,'nunique') for k, v in zip(cols, new)}
print (d)
{'mat_cnt': ('Mat_grade', 'nunique'), 
 'sci_cnt': ('sci_grade', 'nunique'), 
 'eng_cnt': ('eng_grade', 'nunique')}

df = tf.groupby(['stu_id'], as_index=False).agg(**d)
print (df)
   stu_id  mat_cnt  sci_cnt  eng_cnt
0       1        3        1        1
1       2        1        4        2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM