Splitting large dataframe into list of smaller pandas dataframes

Question

I have a dataframe that looks like this:

    y  gdp  cap
0   1    2    5
1   2    3    9
2   8    7    2
3   3    4    7
4   6    7    7

Is there a way I can split it up into a list of pandas dataframes each with 1 row and same header as this big dataframe? I can loop over it ofcourse, but is there a more pythonic soln?

The use case is:

with Pool(processes=5) as p:
    p.starmap(parallel_func, list(single_row_of_dataframe))

Answer 1

Option 1
np.split

for i in np.arange(1, len(df))):
     print(i, '\n')

   y  gdp  cap
0  1    2    5 

   y  gdp  cap
1  2    3    9 

   y  gdp  cap
2  8    7    2 

   y  gdp  cap
3  3    4    7 

   y  gdp  cap
4  6    7    7

If your index is monotonic, you can use that to split:

for i in np.split(df, df.index[1:]):
    print(i, '\n')

Note that np.split , at its heart, is a loop implementation, so you aren't really escaping the iteration.

splits = np.split(df, df.index[1:])

Option 2 Looping over df.index and calling loc :

splits = [df.loc[[i]] for i in df.index]

Fleshing out a discussion in the comments here - if you're looking to do some sort of parallelisation, look into dask dataframes. Don't try and implement your own parallelisation with Pool , you'll actually suffer performance drops.

Answer 2

Or You can using // and groupby , I split the dataframe by 3 , you can change the number you need

[df1 for _,df1 in df.groupby(np.arange(len(df))//3)]
Out[356]: 
[   y  gdp  cap
 0  1    2    5
 1  2    3    9
 2  8    7    2,    y  gdp  cap
 3  3    4    7
 4  6    7    7]

Splitting large dataframe into list of smaller pandas dataframes

Question

2 answers

solution1
5 ACCPTED 2017-11-21 23:29:51

solution2
3 2017-11-21 23:33:44

Splitting large dataframe into list of smaller pandas dataframes

Question

2 answers

solution1 5 ACCPTED 2017-11-21 23:29:51

solution2 3 2017-11-21 23:33:44

solution1
5 ACCPTED 2017-11-21 23:29:51

solution2
3 2017-11-21 23:33:44