简体   繁体   中英

Splitting large dataframe into list of smaller pandas dataframes

I have a dataframe that looks like this:

    y  gdp  cap
0   1    2    5
1   2    3    9
2   8    7    2
3   3    4    7
4   6    7    7

Is there a way I can split it up into a list of pandas dataframes each with 1 row and same header as this big dataframe? I can loop over it ofcourse, but is there a more pythonic soln?

The use case is:

with Pool(processes=5) as p:
    p.starmap(parallel_func, list(single_row_of_dataframe))

Option 1
np.split

for i in np.arange(1, len(df))):
     print(i, '\n')

   y  gdp  cap
0  1    2    5 

   y  gdp  cap
1  2    3    9 

   y  gdp  cap
2  8    7    2 

   y  gdp  cap
3  3    4    7 

   y  gdp  cap
4  6    7    7 

If your index is monotonic, you can use that to split:

for i in np.split(df, df.index[1:]):
    print(i, '\n')

Note that np.split , at its heart, is a loop implementation, so you aren't really escaping the iteration.


splits = np.split(df, df.index[1:])

Option 2 Looping over df.index and calling loc :

splits = [df.loc[[i]] for i in df.index]

Fleshing out a discussion in the comments here - if you're looking to do some sort of parallelisation, look into dask dataframes. Don't try and implement your own parallelisation with Pool , you'll actually suffer performance drops.

Or You can using // and groupby , I split the dataframe by 3 , you can change the number you need

[df1 for _,df1 in df.groupby(np.arange(len(df))//3)]
Out[356]: 
[   y  gdp  cap
 0  1    2    5
 1  2    3    9
 2  8    7    2,    y  gdp  cap
 3  3    4    7
 4  6    7    7]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM