[英]Split a Pandas dataframe into multiple dataframes based on the value of a column
How can I split a Pandas dataframe into multiple dataframes based on the value in a column?如何根据列中的值将 Pandas dataframe 拆分为多个数据帧?
df = pd.DataFrame({'A':[4,5,0,0,5,0,0,4],
'B':[7,8,0,0,4,0,0,0],
'C':[1,3,0,0,7,0,0,0]}, columns = ['A','B','C'])
df["sum"] = df.sum(axis=1)
df["Rolling_sum"] = df["sum"].rolling(2, min_periods=1).sum()
The resulting dataframe is:得到的 dataframe 是:
A B C sum Rolling_sum
0 4 7 1 12 12.0
1 5 8 3 16 28.0
2 0 0 0 0 16.0
3 0 0 0 0 0.0
4 5 4 7 16 16.0
5 0 0 0 0 16.0
6 0 0 0 0 0.0
7 4 0 0 4 4.0
I want to split the dataframe into multiple dataframe based on the occurrence of 0 in the Rolling_sum
column.我想根据
Rolling_sum
列中 0 的出现将 dataframe 拆分为多个 dataframe 。
Expected result:预期结果:
Dataframe 1: Dataframe 1:
A B C sum Rolling_sum
0 4 7 1 12 12.0
1 5 8 3 16 28.0
2 0 0 0 0 16.0
Dataframe 2: Dataframe 2:
A B C sum Rolling_sum
4 5 4 7 16 16.0
5 0 0 0 0 16.0
Dataframe 3: Dataframe 3:
A B C sum Rolling_sum
7 4 0 0 4 4.0
I'm not sure what condition(s) I can use to split the dataframe.我不确定我可以使用什么条件来拆分 dataframe。
You can do cumsum
create the groupby
key then groupby
你可以做
cumsum
创建groupby
键然后groupby
d = {x : y for x , y in df.loc[df['Rolling_sum'].ne(0)].groupby(df['Rolling_sum'].eq(0).cumsum())}
d
Out[260]:
{0: A B C sum Rolling_sum
0 4 7 1 12 12.0
1 5 8 3 16 28.0
2 0 0 0 0 16.0, 1: A B C sum Rolling_sum
4 5 4 7 16 16.0
5 0 0 0 0 16.0, 2: A B C sum Rolling_sum
7 4 0 0 4 4.0}
Note that if you don't mind keeping the zeros, np.split
is easiest:请注意,如果您不介意保留零,
np.split
是最简单的:
np.split(df, df.index[df['Rolling_sum'] == 0])
# [ A B C sum Rolling_sum
# 0 4 7 1 12 12.0
# 1 5 8 3 16 28.0
# 2 0 0 0 0 16.0,
#
# A B C sum Rolling_sum
# 3 0 0 0 0 0.0
# 4 5 4 7 16 16.0
# 5 0 0 0 0 16.0,
#
# A B C sum Rolling_sum
# 6 0 0 0 0 0.0
# 7 4 0 0 4 4.0]
If you want to ignore the zeros, BENY's answer is probably simplest, but you can still do it with np.split
by adjusting the cut points to account for the missing rows:如果您想忽略零,BENY 的答案可能是最简单的,但您仍然可以使用
np.split
通过调整切割点来解决缺失的行:
cuts = np.where(df['Rolling_sum'] == 0)[0] # [3, 6]
cuts -= np.arange(len(cuts)) + 1 # [2, 4]
np.split(df[df['Rolling_sum'] != 0], cuts)
# [ A B C sum Rolling_sum
# 0 4 7 1 12 12.0
# 1 5 8 3 16 28.0,
#
# A B C sum Rolling_sum
# 2 0 0 0 0 16.0
# 4 5 4 7 16 16.0,
#
# A B C sum Rolling_sum
# 5 0 0 0 0 16.0
# 7 4 0 0 4 4.0]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.