I have a pandas DataFrame
that contains cumulative data. One of the columns identifies a new data set. Is there any good way to identify where the column changes value and store the previous row in a new DataFrame
?
Data example:
step_ID value1 value2 test_step
31 1 2 2
31 2 3 2
31 3 5 2
35 1 5 2
35 2 8 2
I would like to save the values from the last row where step_id = 31
. I don't always know how many steps are between the values as this DataFrame
is already sorted by test_step
.
If you simply want to split your DataFrame
according to the step_ID
, you can use the groupby
method:
df_list = [x for _, x in df.groupby("step_ID")]
The variable df_list
will store a list with the generated data frames. Each data frame will be associated with a step_ID
value.
Now, to save only the last row, you can iterate through df_list
and keep only the last row of each DataFrame
:
last_rows = [d.iloc[-1] for d in df_list]
The variable last_rows
will store a list of Series
objects, each of which represents the last row of a DataFrame
in df_list
.
Edit:
A cleaner way to save only the last rows is to use the method pointed out by @Rick M:
df[(df.step_ID != df.step_ID.shift(-1))].copy()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.