简体   繁体   English

将Pandas groupby数据行值重新整理为列标题

[英]Reshaping Pandas groupby data row values into column headers

I am trying to extract grouped row data from a pandas groupby object so that the primary group data ('course' in the example below) act as a row index, the secondary grouped row values act as column headers ('student') and the aggregate values as the corresponding row data ('score'). 我试图从pandas groupby对象中提取分组行数据,以便主要组数据(下面示例中的“course”)充当行索引,次要分组行值充当列标题('student')和聚合值作为相应的行数据('得分')。

So, for example, I would like to transform: 所以,例如,我想改造:

import pandas as pd
import numpy as np

data = {'course_id':[101,101,101,101,102,102,102,102] ,
    'student_id':[1,1,2,2,1,1,2,2],
    'score':[80,85,70,60,90,65,95,80]}

df = pd.DataFrame(data, columns=['course_id', 'student_id','score'])

Which I have grouped by course_id and student_id: 我按course_id和student_id分组:

group = df.groupby(['course_id', 'student_id']).aggregate(np.mean)
g = pd.DataFrame(group)

Into something like this: 进入这样的事情:

data = {'course':[101,102],'1':[82.5,77.5],'2':[65.0,87.5]}
g3 = pd.DataFrame(data, columns=['course', '1', '2'])

I have spent some time looking through the groupby documentation and I have trawled stack overflow and the like but I'm still not sure how to approach the problem. 我花了一些时间查看groupby文档 ,我已经拖网堆栈溢出等,但我仍然不知道如何解决问题。 I would be very grateful if anyone would suggest a sensible way of achieving this for a largish dataset. 如果有人建议采用合理的方法为大型数据集实现这一点,我将非常感激。

Many thanks! 非常感谢!

  • Edited: to fix g3 example typo 编辑:修复g3示例错字
>>> g.reset_index().pivot('course_id', 'student_id', 'score')
student_id     1     2
course_id             
101         82.5  65.0
102         77.5  87.5

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM