[英]grouping dataframes in pandas efficiently?
I have the following dataframe in pandas where there's a unique index ( employee
) for each row and also a group label type
: 我在熊猫中有以下数据框,其中每一行都有一个唯一的索引(
employee
),并且还有一个组标签type
:
df = pandas.DataFrame({"employee": ["a", "b", "c", "d"], "type": ["X", "Y", "Y", "Y"], "value": [10,20,30,40]})
df = df.set_index("employee")
I want to group the employees by type
and then calculate a statistic for each type. 我想按
type
对员工进行分组,然后为每种类型计算一个统计信息。 How can I do this and get a final dataframe which is type x statistic
, for example type x (mean of types)
? 如何执行此操作,并获得最终的数据框,该数据框是
type x statistic
,例如type x (mean of types)
? I tried using groupby
: 我尝试使用
groupby
:
g = df.groupby(lambda x: df.ix[x]["type"])
result = g.mean()
this is inefficient since it references the index ix
of df
for each row - is there a better way? 这是低效的,因为它为每行引用
df
的索引ix
是否有更好的方法?
Like @sza says, you can use: 就像@sza所说,您可以使用:
In [11]: g = df.groupby("type")
In [12]: g.mean()
Out[12]:
value
type
X 10
Y 30
see the groupby docs for more... 有关更多信息,请参阅groupby文档 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.