How can I convert this data frame into a dictionary of dataframes split by the numpy.nan
row?
import pandas
import numpy
names = ['a', 'b', 'c']
df = pandas.DataFrame([1,2,3,numpy.nan, 4,5,6,numpy.nan, 7, 8,9])
>>> df
0
0 1.0
1 2.0
2 3.0
3 NaN
4 4.0
5 5.0
6 6.0
7 NaN
8 7.0
9 8.0
10 9.0
Desired output:
df_dict = {'a': <df1>, 'b': <df2>, 'c': <df3>}
with
df1 =
0
0 1.0
1 2.0
2 3.0
df2 =
4 4.0
5 5.0
6 6.0
df3 =
8 7.0
9 8.0
10 9.0
Use dict comprehension
with groupby
:
d = {names[i]: x.dropna() for i, x in df.groupby(df[0].isnull().cumsum())}
{'c': 0
0 7.0
1 8.0
2 9.0, 'b': 0
0 4.0
1 5.0
2 6.0, 'a': 0
0 1.0
1 2.0
2 3.0}
print (d['a'])
0
0 1.0
1 2.0
2 3.0
print (d['b'])
0
4 4.0
5 5.0
6 6.0
print (d['c'])
0
8 7.0
9 8.0
10 9.0
Another method is by numpy array split ie
import numpy as np
dic = {names[i]: j.dropna() for i,j in enumerate(np.array_split(df, np.where(df[0].isnull())[0]))}
%%timeit dic = {names[i]: j.dropna() for i,j in enumerate(np.array_split(df, np.where(df[0].isnull())[0]))} 100 loops, best of 3: 2.51 ms per loop %%timeit d = {names[i]: x.dropna() for i, x in df.groupby(df[0].isnull().cumsum())} 100 loops, best of 3: 6.1 ms per loop
Here's one way
Originally,
In [2109]: df_dict = dict(zip(
names,
[g.dropna() for _, g in df.groupby(df[0].isnull().cumsum())]
))
On edits realized it's identical to another answer.
In [2100]: df_dict = {names[i]: g.dropna() for i, g in df.groupby(df[0].isnull().cumsum())}
In [2101]: df_dict['a']
Out[2101]:
0
0 1.0
1 2.0
2 3.0
In [2102]: df_dict['b']
Out[2102]:
0
4 4.0
5 5.0
6 6.0
In [2103]: df_dict['c']
Out[2103]:
0
8 7.0
9 8.0
10 9.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.