简体   繁体   English

Pandas Python DataFrames:如何拆分数据框

[英]Pandas Python DataFrames: How to split dataframes

I have a df 我有一个df

df = pd.DataFrame(np.random.randn(11,3))

           0         1         2
0   0.102645 -1.530977  0.408735
1   1.081442  0.615082 -1.457931
2   1.852951  0.360998  0.178162
3   0.726028  2.072609 -1.167996
4  -0.454453  1.310887 -0.969910
5  -0.098552 -0.718283  0.372660
6   0.334170 -0.347934 -0.626079
7  -1.034541 -0.496949 -0.287830
8   1.870277  0.508380 -2.466063
9   1.464942 -0.020060 -0.684136
10 -1.057930  0.295145  0.161727

How can I split this in a given number of subsections, lets say 2 for now. 我如何将其分成给定数量的小节,现在说2个。

Something like this 像这样

           0         1         2
0   0.102645 -1.530977  0.408735
1   1.081442  0.615082 -1.457931
2   1.852951  0.360998  0.178162
3   0.726028  2.072609 -1.167996
4  -0.454453  1.310887 -0.969910

           0         1         2
5  -0.098552 -0.718283  0.372660
6   0.334170 -0.347934 -0.626079
7  -1.034541 -0.496949 -0.287830
8   1.870277  0.508380 -2.466063
9   1.464942 -0.020060 -0.684136
10 -1.057930  0.295145  0.161727

Ideally I would like to use np.array_split(df, 2) but it throws an error as its not an array. 理想情况下,我想使用np.array_split(df,2),但是它抛出错误,因为它不是数组。

Is there a built in function to do this? 有内置的功能可以做到这一点吗? I don't particularly want to use df.loc[a:b] because its difficult to calculate the start and end depending on the given number of sub-dataframes needed. 我不是特别想使用df.loc [a:b],因为它很难根据所需的给定子数据帧数来计算开始和结束。

Try the following. 尝试以下方法。 It should return an array of n sub-dataframes if concatenated would return the original dataframe in question. 如果串联将返回所讨论的原始数据帧,则它应返回n个子数据帧的数组。

import math

def split(df, n):
    size = math.ceil(len(df) / n)
    return [ df[i:i + size] for i in range(0, len(df), size) ]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM