简体   繁体   中英

Group by consecutive index numbers

I was wondering if there is a way to groupby consecutive index numbers and move the groups in different columns. Here is an example of the DataFrame I'm using:

0     19218.965703
1     19247.621650
2     19232.651322
9     19279.216956
10    19330.087371
11    19304.316973

And my idea is to gruoup by sequential index numbers and get something like this:

                 0             1
0     19218.965703  19279.216956    
1     19247.621650  19330.087371
2     19232.651322  19304.316973

Ive been trying to split my data by blocks of 3 and then groupby but I was looking more about something that can be used to group and rearrange sequential index numbers. Thank you!

Here is one way:

from more_itertools import consecutive_groups
                    for i in consecutive_groups(df.index)],axis=1)

              0             1
0  19218.965703  19279.216956
1  19247.621650  19330.087371
2  19232.651322  19304.316973

This is a groupby + pivot_table

m = df.index.to_series().diff().ne(1).cumsum()

    .pivot_table(index='key', columns=m, values=0))

                1             2
0    19218.965703  19279.216956
1    19247.621650  19330.087371
2    19232.651322  19304.316973

Create a new pandas.Series with a new pandas.MultiIndex

a = pd.factorize(df.index - np.arange(len(df)))[0]
b = df.groupby(a).cumcount()

pd.Series(df['0'].to_numpy(), [b, a]).unstack()

              0             1
0  19218.965703  19279.216956
1  19247.621650  19330.087371
2  19232.651322  19304.316973

Similar but with more Numpy

a = pd.factorize(df.index - np.arange(len(df)))[0]
b = df.groupby(a).cumcount()

c = np.empty((b.max() + 1, a.max() + 1), float)
c[b, a] = np.ravel(df)

              0             1
0  19218.965703  19279.216956
1  19247.621650  19330.087371
2  19232.651322  19304.316973

One way from pandas groupby

pd.concat({x: y.reset_index(drop=True) for x, y in df['0'].groupby(s)}, axis=1)

              1             2
0  19218.965703  19279.216956
1  19247.621650  19330.087371
2  19232.651322  19304.316973

I think that you have assumed that the number of observations within each consecutive group will be the same. My approach is:

Prepare the data:

import pandas as pd
import numpy as np

df = pd.DataFrame(data ={'data':[19218.965703 ,19247.621650 ,19232.651322 ,19279.216956 ,19330.087371 ,19304.316973]}, index = [0,1,2,9,10,11] )

And the solution:

df['Group'] = (df.index.to_series()-np.arange(df.shape[0])).rank(method='dense')
df['Observations'] = df.groupby(['Group'])['index'].rank()
df.pivot(index='Observations',columns='Group', values='data')

Which returns:

Group                  1.0           2.0
1.0           19218.965703  19279.216956
2.0           19247.621650  19330.087371
3.0           19232.651322  19304.316973

My way:

pd.concat([df[df['groups']==i][['0']].reset_index(drop=True) for i in df['groups'].unique()],axis=1)

              0             0
0  19218.965703  19279.216956
1  19247.621650  19330.087371
2  19232.651322  19304.316973

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM