如何根据列中连续出现的值将 pandas dataframe 拆分为多个部分？

Question

I have a dataframe which I am representing in a tabular format below.我有一个 dataframe，我在下面以表格格式表示。 The original dataframe is a lot bigger in size and therefore I cannot afford to loop on each row.原来的 dataframe 尺寸要大得多，因此我不能在每一行上循环。

col1 | col2 | col3
a      x     1
b      y     1
c      z     0
d      k     1
e      l     1

What I want is split it into subsets of dataframes with consecutive number of 1 s in the column col3 .我想要的是将其拆分为col3列中连续数为1的数据帧的子集。 So ideally I want to above dataframe to return two dataframes df1 and df2所以理想情况下，我想在 dataframe 之上返回两个数据帧df1和df2

df1

col1 | col2 | col3
a      x     1
b      y     1

df2

col1 | col2 | col3
d      k     1
e      l     1

Is there an approach like groupby to do this?有没有像groupby这样的方法来做到这一点？ If I use groupby it returns me all the 4 rows in a dataframe with col3==1 .如果我使用groupby ，它将返回 dataframe 中的所有 4 行col3==1 。 I do not want that as I need two dataframes each consisting of consecutively occuring 1 s.我不希望这样，因为我需要两个数据帧，每个数据帧由连续出现的1组成。 One method is to obviously loop by the rows and as and when I find a 0, I can return a dataframe but that is not efficient.一种方法显然是逐行循环，当我找到 0 时，我可以返回 dataframe 但这不是有效的。 Any kind of help is appreciated.任何形式的帮助表示赞赏。

Answer 1

First compare values by 1 , then create consecutive groups by shift and cumulative sum and last in list comprehension with groupby get all groups:首先比较值1 ，然后通过shift和累积总和创建连续组，最后在列表理解中使用groupby获取所有组：

m1 = df['col3'].eq(1)
g = m1.ne(m1.shift()).cumsum()

dfs = [x for i, x in df[m1].groupby(g)]
print (dfs)
[  col1 col2  col3
0    a    x     1
1    b    y     1,   col1 col2  col3
3    d    k     1
4    e    l     1]

print (dfs[0])
  col1 col2  col3
0    a    x     1
1    b    y     1

If also is necessary remove single 1 rows is added Series.duplicated with keep=False :如果还需要删除单个1行添加Series.duplicated with keep=False ：

print (df)
  col1 col2  col3
0    a    x     1
1    b    y     1
2    c    z     0
3    d    k     1
4    e    l     1
5    f    m     0
6    g    n     1 <- removed

m1 = df['col3'].eq(1)
g = m1.ne(m1.shift()).cumsum()

g = g[g.duplicated(keep=False)]
print (g)
0    1
1    1
3    3
4    3
Name: col3, dtype: int32

dfs = [x for i, x in df[m1].groupby(g)]
print (dfs)
[  col1 col2  col3
0    a    x     1
1    b    y     1,   col1 col2  col3
3    d    k     1
4    e    l     1]

如何根据列中连续出现的值将 pandas dataframe 拆分为多个部分？

问题描述

1 个解决方案

解决方案1
4 2020-05-14 05:00:01

如何根据列中连续出现的值将 pandas dataframe 拆分为多个部分？

问题描述

1 个解决方案

解决方案1 4 2020-05-14 05:00:01

解决方案1
4 2020-05-14 05:00:01