根据列的值将 Pandas dataframe 拆分为多个数据帧

Question

How can I split a Pandas dataframe into multiple dataframes based on the value in a column?如何根据列中的值将 Pandas dataframe 拆分为多个数据帧？

df = pd.DataFrame({'A':[4,5,0,0,5,0,0,4],
                   'B':[7,8,0,0,4,0,0,0],
                   'C':[1,3,0,0,7,0,0,0]}, columns = ['A','B','C'])

df["sum"] = df.sum(axis=1)
df["Rolling_sum"] = df["sum"].rolling(2, min_periods=1).sum()

The resulting dataframe is:得到的 dataframe 是：

    A   B   C   sum     Rolling_sum
0   4   7   1   12  12.0
1   5   8   3   16  28.0
2   0   0   0   0   16.0
3   0   0   0   0   0.0
4   5   4   7   16  16.0
5   0   0   0   0   16.0
6   0   0   0   0   0.0
7   4   0   0   4   4.0

I want to split the dataframe into multiple dataframe based on the occurrence of 0 in the Rolling_sum column.我想根据Rolling_sum列中 0 的出现将 dataframe 拆分为多个 dataframe 。

Expected result:预期结果：

Dataframe 1: Dataframe 1：

    A   B   C   sum     Rolling_sum
0   4   7   1   12  12.0
1   5   8   3   16  28.0
2   0   0   0   0   16.0

Dataframe 2: Dataframe 2：

    A   B   C   sum     Rolling_sum
4   5   4   7   16  16.0
5   0   0   0   0   16.0

Dataframe 3: Dataframe 3：

    A   B   C   sum     Rolling_sum
7   4   0   0   4   4.0

I'm not sure what condition(s) I can use to split the dataframe.我不确定我可以使用什么条件来拆分 dataframe。

Answer 1

You can do cumsum create the groupby key then groupby你可以做cumsum创建groupby键然后groupby

d = {x : y for x , y in df.loc[df['Rolling_sum'].ne(0)].groupby(df['Rolling_sum'].eq(0).cumsum())}
d
Out[260]: 
{0:    A  B  C  sum  Rolling_sum
 0  4  7  1   12         12.0
 1  5  8  3   16         28.0
 2  0  0  0    0         16.0, 1:    A  B  C  sum  Rolling_sum
 4  5  4  7   16         16.0
 5  0  0  0    0         16.0, 2:    A  B  C  sum  Rolling_sum
 7  4  0  0    4          4.0}

Answer 2

Note that if you don't mind keeping the zeros, np.split is easiest:请注意，如果您不介意保留零， np.split是最简单的：

np.split(df, df.index[df['Rolling_sum'] == 0])

# [   A  B  C  sum  Rolling_sum
#  0  4  7  1   12         12.0
#  1  5  8  3   16         28.0
#  2  0  0  0    0         16.0,
#
#     A  B  C  sum  Rolling_sum
#  3  0  0  0    0          0.0
#  4  5  4  7   16         16.0
#  5  0  0  0    0         16.0,
#
#     A  B  C  sum  Rolling_sum
#  6  0  0  0    0          0.0
#  7  4  0  0    4          4.0]

If you want to ignore the zeros, BENY's answer is probably simplest, but you can still do it with np.split by adjusting the cut points to account for the missing rows:如果您想忽略零，BENY 的答案可能是最简单的，但您仍然可以使用np.split通过调整切割点来解决缺失的行：

cuts = np.where(df['Rolling_sum'] == 0)[0] # [3, 6]
cuts -= np.arange(len(cuts)) + 1           # [2, 4]
np.split(df[df['Rolling_sum'] != 0], cuts)

# [   A  B  C  sum  Rolling_sum
#  0  4  7  1   12         12.0
#  1  5  8  3   16         28.0,
#
#     A  B  C  sum  Rolling_sum
#  2  0  0  0    0         16.0
#  4  5  4  7   16         16.0,
#
#     A  B  C  sum  Rolling_sum
#  5  0  0  0    0         16.0
#  7  4  0  0    4          4.0]

根据列的值将 Pandas dataframe 拆分为多个数据帧

问题描述

2 个解决方案

解决方案1
1 已采纳 2021-11-19 21:53:40

解决方案2
1 2021-11-19 22:21:24

根据列的值将 Pandas dataframe 拆分为多个数据帧

问题描述

2 个解决方案

解决方案1 1 已采纳 2021-11-19 21:53:40

解决方案2 1 2021-11-19 22:21:24

解决方案1
1 已采纳 2021-11-19 21:53:40

解决方案2
1 2021-11-19 22:21:24