简体   繁体   English

pandas数据帧组由多行组成

[英]pandas dataframe groupby a number of rows

If you have a pandas DataFrame({'a':[1,2,3,4,5,6,7,8,9]}) is there a simple way to group it into groups of 3 or any number? 如果你有一个pandas DataFrame({'a':[1,2,3,4,5,6,7,8,9]})有没有一种简单的方法将它分成3组或任何数字组?

I understand this can be done by adding an extra column that contains values to allow grouping, for example you could join the above DataFrame to [1,1,1,2,2,2,3,3,3] and groupby the added column. 我知道这可以通过添加一个包含允许分组的值的额外列来完成,例如,您可以将上面的DataFrame连接到[1,1,1,2,2,2,3,3,3]并添加groupby柱。 But it seems like it shouldn't be necessary to add an extra column for this operation. 但似乎没有必要为此操作添加额外的列。

Also I could create a array of indexes np.linspace(0,9,4) and loop over the array values using them as parameters to the DataFrame.ix[] but that doesn't seem fast for large DataFrames. 此外,我可以创建一个索引数组np.linspace(0,9,4)并使用它们作为DataFrame.ix []的参数循环数组值,但对于大型DataFrame来说似乎并不快。

Am I missing a simpler way? 我错过了一个更简单的方法吗?

==Solution== == ==解决方案

From the answers below my preferred solution is to use numpy.array_split ( it doesn't raise an exception if an unequal division is made unlike numpy.split ), you can also pass an array of indexes to split on rather than the number of resulting pieces desired. 从下面的答案我的首选解决方案是使用numpy.array_split (如果不像numpy.split那样进行不等分,它不会引发异常),你也可以传递一个索引数组来分割而不是结果的数量件想要的。 With the line below you can split a DataFrame (df) into smaller DataFrames of x rows 使用下面的行,您可以将DataFrame(df)拆分为x行的较小DataFrame

split_df = np.array_split(df, np.arange(0, len(df),x))

The split_df is a list where the first object is an empty numpy array and the following objects are the split DataFrames. split_df是一个列表,其中第一个对象是空的numpy数组,以下对象是拆分的DataFrame。

Based on your example DataFrame : 根据您的示例DataFrame

In [25]: df.index/3
Out[25]: Int64Index([0, 0, 0, 1, 1, 1, 2, 2, 2], dtype=int64)

In [26]: for k,g in df.groupby(df.index/3):
    ...:     print k,g
    ...:     
0    a
0  1
1  2
2  3
1    a
3  4
4  5
5  6
2    a
6  7
7  8
8  9

Here is another method that use numpy.split or numpy.array_split : 这是使用numpy.splitnumpy.array_split另一种方法:

df = pd.DataFrame({"A":np.arange(9), "B":np.arange(10, 19)}, 
                  index=np.arange(100, 109))
for tmp in np.split(df, 3):
    print tmp

the output is: 输出是:

     A   B
100  0  10
101  1  11
102  2  12
     A   B
103  3  13
104  4  14
105  5  15
     A   B
106  6  16
107  7  17
108  8  18

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM