How to break a pandas dataframe into sub dataframes when a certain value is found in the dataframe column?

Question

I have dataframe that looks like this:

data = pd.DataFrame({"event": ["A", "B", "C", "A", "A", "E", "P", "S", "A", "Y", "A"]})
data.head(15)

    event
  0 A
  1 B
  2 C
  3 A
  4 A
  5 E
  6 P
  7 S
  8 A
  9 Y
 10 A

I want to break this dataframe into 5 small dataframes whenever the event "A" is found. So the five dataframes I want to create, would look like this in the case:

1)    event
    0   A
    1   B
    2   C

2)    event
    0   A

3)    event
    0   A
    1   E
    2   P
    3   S
    
4)    event
    0   A
    1   Y

5)    event
    0   A

Is there any elegant way to do this with Python Pandas and also Pyspark?

Answer 1

With pandas, use groupby with a helper grouper using data['event'].eq('A').cumsum() :

dfs = [g for _,g in data.groupby(data['event'].eq('A').cumsum())]

or to get a new index, add a reset_index :

dfs = [g.reset_index(drop=True)
       for _,g in data.groupby(data['event'].eq('A').cumsum())]

output (without reset_index ):

[  event
 0     A
 1     B
 2     C,
   event
 3     A,
   event
 4     A
 5     E
 6     P
 7     S,
   event
 8     A
 9     Y,
    event
 10     A]

How to break a pandas dataframe into sub dataframes when a certain value is found in the dataframe column?

Question

1 answers

solution1
1 ACCPTED 2022-03-27 17:38:27

How to break a pandas dataframe into sub dataframes when a certain value is found in the dataframe column?

Question

1 answers

solution1 1 ACCPTED 2022-03-27 17:38:27

solution1
1 ACCPTED 2022-03-27 17:38:27