简体   繁体   English

熊猫数据透视表组摘要

[英]Pandas pivot table group summary

Given the following data frame: 给定以下数据框:

import numpy as np
import pandas as pd
df = pd.DataFrame({'group':['s','s','s','p','p','p'],
                   'section':['a','b','b','a','a','b']
                   })

    group   section
0     s     a
1     s     b
2     s     b
3     p     a
4     p     a
5     p     b

I'd like a count of the number of sections per group and the maximum number of rows per section for each group. 我想计算每个组的节数和每个组的每个节的最大行数。 Like this: 像这样:

  group section count   max min
    s      2            2    1
    p      2            2    1

You can achieve this by grouping on 'group' generate the value_counts and then grouping again: 您可以通过在'group'上分组来生成value_counts,然后再次进行分组来实现此目的:

In [91]:
df.groupby('group')['section'].apply(pd.Series.value_counts).groupby(level=1).agg(['nunique','max','min'])

Out[91]:
   nunique  max  min
a        2    2    1
b        2    2    1

To get close to the desired result you can do this: 要接近所需的结果,您可以执行以下操作:

In [102]:
df.groupby('group')['section'].apply(pd.Series.value_counts).reset_index().drop('level_1', axis=1).groupby('group',as_index=False).agg(['nunique','max','min'])

Out[102]:
      section        
      nunique max min
group                
p           2   2   1
s           2   2   1

IIUC you can use: 您可以使用IIUC:

import pandas as pd
import numpy as np

df = pd.DataFrame({'group':['s','s','s','s','p','p','p','p','p'],
                   'section':['b','b','b','a','a','b','a','a','b']
                   })
print (df)
  group section
0     s       b
1     s       b
2     s       b
3     s       a
4     p       a
5     p       b
6     p       a
7     p       a
8     p       b

print (df.groupby(['group', 'section']).size() )
group  section
p      a          3
       b          2
s      a          1
       b          3
dtype: int64

print (df.groupby(['group', 'section']).size().groupby(level=1).agg([len, min, max]) ) 
         len  min  max
section               
a          2    1    3
b          2    2    3

Or maybe you can change len to nunique : 或者,您可以将len更改为nunique

print (df.groupby(['group', 'section']).size().groupby(level=1).agg(['nunique', min, max]) ) 
         nunique  min  max
section                   
a              2    1    3
b              2    2    3

Or in need by first level of multiindex: 或按一级多索引的需要:

print (df.groupby(['group', 'section']).size().groupby(level=0).agg([len, min, max]) ) 
       len  min  max
group               
p        2    2    3
s        2    1    3

print (df.groupby(['group', 'section']).size().groupby(level=0).agg(['nunique', min, max]) ) 
       nunique  min  max
group                   
p            2    2    3
s            2    1    3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM