简体   繁体   中英

When using Pandas groupby, how do I start the next group when a column value is met?

I have a DataFrame with a column within it called "Current_Position"...I want to split the Dataframe into groups anytime the value of "Current_Position" is equal to 0...I want the row that the 0 occurs in to be the last row of the current group. The next row will start the next group. How do I accomplish this?

    Current_Position
0   2
1   4
2   2
3   0
4   2
5   0
6   2
7   0
8   1
9   2
10  0
11  2
12  1
13  0
14  1
15  2
16  1
17  0
18  1
19  0

Expected Output:

    Current_Position  Group
0                  2      0
1                  4      0
2                  2      0
3                  0      0
4                  2      1
5                  0      1
6                  2      2
7                  0      2
8                  1      3
9                  2      3
10                 0      3
11                 2      4
12                 1      4
13                 0      4
14                 1      5
15                 2      5
16                 1      5
17                 0      5
18                 1      6
19                 0      6

You can use boolean indexing to return True or False if a row is equal to zero. To make it the last of a group, use .shift() to shift one row. Then, take the .cumsum() , to get the groups:

df['Group'] = (df['Current_Position'].shift() == 0).cumsum()
df
Out[1]: 
    Current_Position  Group
0                  2      0
1                  4      0
2                  2      0
3                  0      0
4                  2      1
5                  0      1
6                  2      2
7                  0      2
8                  1      3
9                  2      3
10                 0      3
11                 2      4
12                 1      4
13                 0      4
14                 1      5
15                 2      5
16                 1      5
17                 0      5
18                 1      6
19                 0      6
  1. We have used .shift() to shift the data down one row. This allows rows with 0 values to be the last row of a group instead of the first.
  2. We have used == 0 to convert Group to a boolean data type that returns either True or False . Values within a boolean series are essentailly the equivalent of 1 or 0 , so you can use .cumsum() , sum on other mathematical operations on it. You wouldn't be able to do these mathematical operations if for example we created a column with an object data type that returns 'True' or 'False' STRINGS with something like `df['Group'] = np.where(df['Current Position'] == 0, 'True', 'False').

Below is a breakdown of the logic in three steps, so it can be easily visualized:

df['Group1'] = df['Current_Position'].shift()
df['Group2'] = (df['Group1'] == 0)
df['Group3'] = df['Group2'] .cumsum()
df
Out[2]: 
    Current_Position  Group1  Group2  Group3
0                  2     NaN   False       0
1                  4     2.0   False       0
2                  2     4.0   False       0
3                  0     2.0   False       0
4                  2     0.0    True       1
5                  0     2.0   False       1
6                  2     0.0    True       2
7                  0     2.0   False       2
8                  1     0.0    True       3
9                  2     1.0   False       3
10                 0     2.0   False       3
11                 2     0.0    True       4
12                 1     2.0   False       4
13                 0     1.0   False       4
14                 1     0.0    True       5
15                 2     1.0   False       5
16                 1     2.0   False       5
17                 0     1.0   False       5
18                 1     0.0    True       6
19                 0     1.0   False       6

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM