简体   繁体   中英

How to find the number of groups in multi-index groupby object in pandas?

My question is simple, but I could not find an answer anywhere I looked.

I want to have the number of groups in a multi-index pandas groupby object. Note that this is not the same as the number of elements in the groups (use .size() ), nor the overall number of groups (use len . See here ).

It is best to illustrate with an example.

Let's create a simple dataframe:

import pandas as pd
df = pd.DataFrame({'Group': ['gr1','gr1','gr2','gr2','gr3','gr3'],
                   'Kind': ['sweet','sour','sweet','sour','sweet','sour'],
                   'Values': [10,11,200,201,300,301]})

Now we group using two columns:

gr = df.groupby(['Group','Kind'])

This produces the desired groupby object. It has a total of six groups, as you can verify with:

len(gr)

I can now iterate through the groups:

for key,group in gr:
    print key

This produces the following:

('gr1', 'sour')
('gr1', 'sweet')
('gr2', 'sour')
('gr2', 'sweet')
('gr3', 'sour')
('gr3', 'sweet')

We can see that the first key has 3 unique entries, the second 2 unique entries.

What I am looking for is something that given gr returns (3,2) without having access to the original dataset from which gr was generated and without iterating through the groupby object, building up a list, and finding its unique elements.

The shortest way I can think of might be

>>> gr.dtypes.index.levshape
(3, 2)

Basically, we need to get a handle on the groups in the form of a MultiIndex:

>>> gr.dtypes
              Group    Kind Values
Group Kind                        
gr1   sour   object  object  int64
      sweet  object  object  int64
gr2   sour   object  object  int64
      sweet  object  object  int64
gr3   sour   object  object  int64
      sweet  object  object  int64
>>> gr.dtypes.index
MultiIndex(levels=[['gr1', 'gr2', 'gr3'], ['sour', 'sweet']],
           labels=[[0, 0, 1, 1, 2, 2], [0, 1, 0, 1, 0, 1]],
           names=['Group', 'Kind'])
>>> gr.dtypes.index.levels
FrozenList([['gr1', 'gr2', 'gr3'], ['sour', 'sweet']])
>>> gr.dtypes.index.levshape
(3, 2)

Originally I was thinking

>>> pd.Series(gr.groups).index.levshape
(3, 2)

to manufacture a new index from the groups dictionary, but it looks like the info is already there in dtypes .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM