Finding row where a column value change in dataframe

Question

I have a pandas DataFrame that contains cumulative data. One of the columns identifies a new data set. Is there any good way to identify where the column changes value and store the previous row in a new DataFrame ?

Data example:

step_ID   value1    value2  test_step
31        1         2        2
31        2         3        2
31        3         5        2
35        1         5        2  
35        2         8        2

I would like to save the values from the last row where step_id = 31 . I don't always know how many steps are between the values as this DataFrame is already sorted by test_step .

Answer 1

If you simply want to split your DataFrame according to the step_ID , you can use the groupby method:

df_list = [x for _, x in df.groupby("step_ID")]

The variable df_list will store a list with the generated data frames. Each data frame will be associated with a step_ID value.

Now, to save only the last row, you can iterate through df_list and keep only the last row of each DataFrame :

last_rows = [d.iloc[-1] for d in df_list]

The variable last_rows will store a list of Series objects, each of which represents the last row of a DataFrame in df_list .

Edit:

A cleaner way to save only the last rows is to use the method pointed out by @Rick M:

df[(df.step_ID != df.step_ID.shift(-1))].copy()

Finding row where a column value change in dataframe

Question

1 answers

solution1
1 ACCPTED 2021-02-09 13:58:08

Finding row where a column value change in dataframe

Question

1 answers

solution1 1 ACCPTED 2021-02-09 13:58:08

solution1
1 ACCPTED 2021-02-09 13:58:08