简体   繁体   中英

How to divide dataframe into 2 equal parts (first half rows and second half rows) - in Python

I have a dataframe and need to break it into 2 equal dataframes .

1st dataframe would contain top half rows and 2nd would contain the remaining rows .

Please help how to achieve this using python .

Also in both the even rows scenario and odd rows scenario (as in odd rows I would need to drop the last row to make it equal).

在此处输入图像描述

在此处输入图像描述

Consider df :

In [122]: df
Out[122]: 
    id  days  sold  days_lag
0    1     1     1         0
1    1     3     0         2
2    1     3     1         2
3    1     8     1         5
4    1     8     1         5
5    1     8     0         5
6    2     3     0         0
7    2     8     1         5
8    2     8     1         5
9    2     9     2         1
10   2     9     0         1
11   2    12     1         3
12   3     4     5         6

Use numpy.array_split() :

In [127]: import numpy as np

In [128]: def split_df(df):
     ...:     if len(df) % 2 != 0:  # Handling `df` with `odd` number of rows
     ...:         df = df.iloc[:-1, :]
     ...:     df1, df2 =  np.array_split(df, 2)
     ...:     return df1, df2
     ...: 

In [130]: df1, df2 = split_df(df)

In [131]: df1
Out[131]: 
   id  days  sold  days_lag
0   1     1     1         0
1   1     3     0         2
2   1     3     1         2
3   1     8     1         5
4   1     8     1         5
5   1     8     0         5

In [133]: df2
Out[133]: 
    id  days  sold  days_lag
6    2     3     0         0
7    2     8     1         5
8    2     8     1         5
9    2     9     2         1
10   2     9     0         1
11   2    12     1         3

with a simple eg. you can try as below:

import pandas as pd
data = [['Alex',10],['Bob',12],['Clarke',13],['Tom',20],['Jerry',25]]
#data = [['Alex',10],['Bob',12],['Clarke',13],['Tom',20]]
data1 = data[0:int(len(data)/2)]
if (len(data) % 2) == 0:
    data2 = data[int(len(data)/2):]
else:
    data2 = data[int(len(data)/2):-1]

df1 = pd.DataFrame(data1, columns=['Name', 'Age'], dtype=float); print("1st half:\n",df1)
df2 = pd.DataFrame(data2, columns=['Name', 'Age'], dtype=float); print("2nd Half:\n",df2)

Output:

D:\Python>python temp.py

1st half:
    Name   Age
 0  Alex  10.0
 1   Bob  12.0
2nd Half:
    Name   Age
 0  Clarke  13.0
 1     Tom  20.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM