[英]How can I split this dataframe by blank spaces?
How can I convert this data frame into a dictionary of dataframes split by the numpy.nan
row? 如何将此数据帧转换为由
numpy.nan
行拆分的数据帧字典?
import pandas
import numpy
names = ['a', 'b', 'c']
df = pandas.DataFrame([1,2,3,numpy.nan, 4,5,6,numpy.nan, 7, 8,9])
>>> df
0
0 1.0
1 2.0
2 3.0
3 NaN
4 4.0
5 5.0
6 6.0
7 NaN
8 7.0
9 8.0
10 9.0
Desired output: 期望的输出:
df_dict = {'a': <df1>, 'b': <df2>, 'c': <df3>}
with 同
df1 =
0
0 1.0
1 2.0
2 3.0
df2 =
4 4.0
5 5.0
6 6.0
df3 =
8 7.0
9 8.0
10 9.0
Use dict comprehension
with groupby
: 在
groupby
使用dict comprehension
:
d = {names[i]: x.dropna() for i, x in df.groupby(df[0].isnull().cumsum())}
{'c': 0
0 7.0
1 8.0
2 9.0, 'b': 0
0 4.0
1 5.0
2 6.0, 'a': 0
0 1.0
1 2.0
2 3.0}
print (d['a'])
0
0 1.0
1 2.0
2 3.0
print (d['b'])
0
4 4.0
5 5.0
6 6.0
print (d['c'])
0
8 7.0
9 8.0
10 9.0
Another method is by numpy array split ie 另一种方法是通过numpy数组拆分即
import numpy as np
dic = {names[i]: j.dropna() for i,j in enumerate(np.array_split(df, np.where(df[0].isnull())[0]))}
%%timeit dic = {names[i]: j.dropna() for i,j in enumerate(np.array_split(df, np.where(df[0].isnull())[0]))} 100 loops, best of 3: 2.51 ms per loop %%timeit d = {names[i]: x.dropna() for i, x in df.groupby(df[0].isnull().cumsum())} 100 loops, best of 3: 6.1 ms per loop
Here's one way 这是一种方式
Originally, 本来,
In [2109]: df_dict = dict(zip(
names,
[g.dropna() for _, g in df.groupby(df[0].isnull().cumsum())]
))
On edits realized it's identical to another answer. 编辑意识到它与另一个答案完全相同。
In [2100]: df_dict = {names[i]: g.dropna() for i, g in df.groupby(df[0].isnull().cumsum())}
In [2101]: df_dict['a']
Out[2101]:
0
0 1.0
1 2.0
2 3.0
In [2102]: df_dict['b']
Out[2102]:
0
4 4.0
5 5.0
6 6.0
In [2103]: df_dict['c']
Out[2103]:
0
8 7.0
9 8.0
10 9.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.