[英]Split a pandas dataframe into multiple dataframes if all rows are nan
我有以下数据框。
a b c d
0 4.65 30.572857 133.899994 23.705000
1 4.77 30.625713 134.690002 23.225000
2 4.73 30.138571 132.250000 23.040001
3 5.07 30.082857 130.000000 23.290001
4 4.98 30.282858 133.520004 23.389999
5 NaN NaN NaN NaN
6 4.82 29.674286 127.349998 23.700001
7 4.83 30.092857 129.110001 24.254999
8 4.85 29.918571 127.349998 24.695000
9 4.70 29.418571 127.139999 24.424999
10 4.69 30.719999 127.610001 25.200001
11 NaN NaN NaN NaN
12 4.43 29.724285 126.620003 24.764999
13 NaN NaN NaN NaN
14 4.29 29.010000 120.309998 24.730000
15 4.11 29.420000 119.480003 25.035000
当有所有 NaN 的行时,我想将此 df 拆分为多个 dfs。
我浏览了以下链接,但无法弄清楚如何将其应用于我的问题。
在我的示例中,我将有 5、5、1 和 2 行的 4 个数据帧作为输出。 请建议前进的方向。
使用isna
, all
, cumsum
和groupby
。
首先我们检查一行中的所有值是否都是NaN
,然后使用 cumsum 创建一个组指标,最后我们将这些数据帧保存在一个列表中,使用 groupby:
grps = df.isna().all(axis=1).cumsum()
dfs = [df.dropna() for _, df in df.groupby(grps)]
for df in dfs:
print(df)
a b c d
0 4.65 30.572857 133.899994 23.705000
1 4.77 30.625713 134.690002 23.225000
2 4.73 30.138571 132.250000 23.040001
3 5.07 30.082857 130.000000 23.290001
4 4.98 30.282858 133.520004 23.389999
a b c d
6 4.82 29.674286 127.349998 23.700001
7 4.83 30.092857 129.110001 24.254999
8 4.85 29.918571 127.349998 24.695000
9 4.70 29.418571 127.139999 24.424999
10 4.69 30.719999 127.610001 25.200001
a b c d
12 4.43 29.724285 126.620003 24.764999
a b c d
14 4.29 29.01 120.309998 24.730
15 4.11 29.42 119.480003 25.035
像这样的事情应该可以解决问题:
import pandas as pd
import numpy as np
data_frame = pd.DataFrame({"a":[1,np.nan,3,np.nan,4,np.nan,5],
"b":[1,np.nan,3,np.nan,4,np.nan,5],
"c":[1,np.nan,3,np.nan,4,np.nan,5],
"d":[1,np.nan,3,np.nan,4,np.nan,5],
"e":[1,np.nan,3,np.nan,4,np.nan,5],
"f":[1,np.nan,3,np.nan,4,np.nan,5]})
all_nan = data_frame.index[data_frame.isnull().all(1)]
df_list = []
prev = 0
for i in all_nan:
df_list.append(data_frame[prev:i])
prev = i+1
for i in df_list:
print(i)
做同样事情的另一种味道:
nan_indices = df.index[df.isna().all(axis=1)]
df_list = [df.dropna() for df in np.split(df, nan_indices)]
df_list
[ a b c d
0 4.65 30.572857 133.899994 23.705000
1 4.77 30.625713 134.690002 23.225000
2 4.73 30.138571 132.250000 23.040001
3 5.07 30.082857 130.000000 23.290001
4 4.98 30.282858 133.520004 23.389999,
a b c d
6 4.82 29.674286 127.349998 23.700001
7 4.83 30.092857 129.110001 24.254999
8 4.85 29.918571 127.349998 24.695000
9 4.70 29.418571 127.139999 24.424999
10 4.69 30.719999 127.610001 25.200001,
a b c d
12 4.43 29.724285 126.620003 24.764999,
a b c d
14 4.29 29.01 120.309998 24.730
15 4.11 29.42 119.480003 25.035]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.