简体   繁体   English

熊猫分组汇总

[英]Pandas groupby aggregation

Let's say we have a pandas dataframe like the one below. 假设我们有一个像下面这样的熊猫数据框。

> category       level       score
>   Bus          travel      0.75
>   Bus          travel      0.60
>   Bus          vehicles    0.50

What I want is to group by the 'level' and calculate the 'count' and the maximum score for each 'level'. 我想要的是按“级别”分组并计算“计数”和每个“级别”的最高分数。 Also the 'hard' part is to create an output like this: 同样,“困难”部分是创建这样的输出:

> category   travel  score    vehicles  score
>  Bus         2     0.75        1       0.5

I have been trying doing this: 我一直在尝试这样做:

>     grouped = df.groupby('level').agg(
    {
        'category': 'count',
        'score': 'max'
     })

Any ideas? 有任何想法吗?

Setup 设定

from StringIO import StringIO
import pandas as pd

text = """category       level       score
   Bus          travel      0.75
   Bus          travel      0.60
   Bus          vehicles    0.50"""

df = pd.read_csv(StringIO(text), delim_whitespace=1)

print df

  category     level  score
0      Bus    travel   0.75
1      Bus    travel   0.60
2      Bus  vehicles   0.50

Solution

gdf = df.groupby('category').apply(
    lambda df: df.groupby('level')['score'].agg({'count', 'max'})).unstack()

gdf.columns = gdf.columns.swaplevel(0, 1)
gdf = gdf.sort_index(axis=1)

print gdf

level    travel       vehicles     
          count   max    count  max
category                           
Bus           2  0.75        1  0.5

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM