简体   繁体   中英

pandas dataframe groupby a number of rows

If you have a pandas DataFrame({'a':[1,2,3,4,5,6,7,8,9]}) is there a simple way to group it into groups of 3 or any number?

I understand this can be done by adding an extra column that contains values to allow grouping, for example you could join the above DataFrame to [1,1,1,2,2,2,3,3,3] and groupby the added column. But it seems like it shouldn't be necessary to add an extra column for this operation.

Also I could create a array of indexes np.linspace(0,9,4) and loop over the array values using them as parameters to the DataFrame.ix[] but that doesn't seem fast for large DataFrames.

Am I missing a simpler way?

==Solution==

From the answers below my preferred solution is to use numpy.array_split ( it doesn't raise an exception if an unequal division is made unlike numpy.split ), you can also pass an array of indexes to split on rather than the number of resulting pieces desired. With the line below you can split a DataFrame (df) into smaller DataFrames of x rows

split_df = np.array_split(df, np.arange(0, len(df),x))

The split_df is a list where the first object is an empty numpy array and the following objects are the split DataFrames.

Based on your example DataFrame :

In [25]: df.index/3
Out[25]: Int64Index([0, 0, 0, 1, 1, 1, 2, 2, 2], dtype=int64)

In [26]: for k,g in df.groupby(df.index/3):
    ...:     print k,g
    ...:     
0    a
0  1
1  2
2  3
1    a
3  4
4  5
5  6
2    a
6  7
7  8
8  9

Here is another method that use numpy.split or numpy.array_split :

df = pd.DataFrame({"A":np.arange(9), "B":np.arange(10, 19)}, 
                  index=np.arange(100, 109))
for tmp in np.split(df, 3):
    print tmp

the output is:

     A   B
100  0  10
101  1  11
102  2  12
     A   B
103  3  13
104  4  14
105  5  15
     A   B
106  6  16
107  7  17
108  8  18

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM