熊猫分组汇总

Question

Let's say we have a pandas dataframe like the one below. 假设我们有一个像下面这样的熊猫数据框。

> category       level       score
>   Bus          travel      0.75
>   Bus          travel      0.60
>   Bus          vehicles    0.50

What I want is to group by the 'level' and calculate the 'count' and the maximum score for each 'level'. 我想要的是按“级别”分组并计算“计数”和每个“级别”的最高分数。 Also the 'hard' part is to create an output like this: 同样，“困难”部分是创建这样的输出：

> category   travel  score    vehicles  score
>  Bus         2     0.75        1       0.5

I have been trying doing this: 我一直在尝试这样做：

>     grouped = df.groupby('level').agg(
    {
        'category': 'count',
        'score': 'max'
     })

Any ideas? 有任何想法吗？

Answer 1

Setup 设定

from StringIO import StringIO
import pandas as pd

text = """category       level       score
   Bus          travel      0.75
   Bus          travel      0.60
   Bus          vehicles    0.50"""

df = pd.read_csv(StringIO(text), delim_whitespace=1)

print df

  category     level  score
0      Bus    travel   0.75
1      Bus    travel   0.60
2      Bus  vehicles   0.50

Solution 解

gdf = df.groupby('category').apply(
    lambda df: df.groupby('level')['score'].agg({'count', 'max'})).unstack()

gdf.columns = gdf.columns.swaplevel(0, 1)
gdf = gdf.sort_index(axis=1)

print gdf

level    travel       vehicles     
          count   max    count  max
category                           
Bus           2  0.75        1  0.5

熊猫分组汇总

问题描述

1 个解决方案

解决方案1
1 已采纳 2016-05-17 09:02:36

Setup 设定

Solution 解

熊猫分组汇总

问题描述

1 个解决方案

解决方案1 1 已采纳 2016-05-17 09:02:36

Setup 设定

Solution 解

解决方案1
1 已采纳 2016-05-17 09:02:36