I have a dataframe that looks like this:
y gdp cap
0 1 2 5
1 2 3 9
2 8 7 2
3 3 4 7
4 6 7 7
Is there a way I can split it up into a list of pandas dataframes each with 1 row and same header as this big dataframe? I can loop over it ofcourse, but is there a more pythonic soln?
The use case is:
with Pool(processes=5) as p:
p.starmap(parallel_func, list(single_row_of_dataframe))
Option 1
np.split
for i in np.arange(1, len(df))):
print(i, '\n')
y gdp cap
0 1 2 5
y gdp cap
1 2 3 9
y gdp cap
2 8 7 2
y gdp cap
3 3 4 7
y gdp cap
4 6 7 7
If your index is monotonic, you can use that to split:
for i in np.split(df, df.index[1:]):
print(i, '\n')
Note that np.split
, at its heart, is a loop implementation, so you aren't really escaping the iteration.
splits = np.split(df, df.index[1:])
Option 2 Looping over df.index
and calling loc
:
splits = [df.loc[[i]] for i in df.index]
Fleshing out a discussion in the comments here - if you're looking to do some sort of parallelisation, look into dask
dataframes. Don't try and implement your own parallelisation with Pool
, you'll actually suffer performance drops.
Or You can using //
and groupby
, I split the dataframe by 3 , you can change the number you need
[df1 for _,df1 in df.groupby(np.arange(len(df))//3)]
Out[356]:
[ y gdp cap
0 1 2 5
1 2 3 9
2 8 7 2, y gdp cap
3 3 4 7
4 6 7 7]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.