简体   繁体   中英

Grouping pandas dataframe by blocks using identical values

I have a big dataframe which is structured as follows:
|'Type'| |'col2'| |'col3'|
| ----- | | -----| |-----|
'A'
'B'
'C'
'C'
'C'
'B'
C
C
C
A
B
C
C
B
C
A

So the types are like hierarchies; As with one or multiple Bs, which have one or multiple Cs. I would like to split up this dataframe into 2 different kinds of chunks:

  • 1 chunk from A until the next A (all B's with C's for each A)
  • 1 chunk within each A chunk, from B until the next B (all C's for each B within an A)

How can I do this?

IIUC, you want col2 to have groups starting with A and col3 subgroups starting with B:

df['col2'] = df['Type'].eq('A').cumsum()
df['col3'] = df['Type'].eq('B').groupby(df['col2']).cumsum()

output:

   Type  col2  col3
0     A     1     0
1     B     1     1
2     C     1     1
3     C     1     1
4     C     1     1
5     B     1     2
6     C     1     2
7     C     1     2
8     C     1     2
9     A     2     0
10    B     2     1
11    C     2     1
12    C     2     1
13    B     2     2
14    C     2     2
15    A     3     0

You can then use col2/col3 to groupby :

m = df[['col2', 'col3']].ne(0).all(1)
for name, g in df[m].groupby(['col2', 'col3']):
    print(f'group {name}')
    print(g)

output:

group (1, 1)
  Type  col2  col3
1    B     1     1
2    C     1     1
3    C     1     1
4    C     1     1
group (1, 2)
  Type  col2  col3
5    B     1     2
6    C     1     2
7    C     1     2
8    C     1     2
group (2, 1)
   Type  col2  col3
10    B     2     1
11    C     2     1
12    C     2     1
group (2, 2)
   Type  col2  col3
13    B     2     2
14    C     2     2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM