简体   繁体   中英

Finding row where a column value change in dataframe

I have a pandas DataFrame that contains cumulative data. One of the columns identifies a new data set. Is there any good way to identify where the column changes value and store the previous row in a new DataFrame ?

Data example:

step_ID   value1    value2  test_step
31        1         2        2
31        2         3        2
31        3         5        2
35        1         5        2  
35        2         8        2 

I would like to save the values from the last row where step_id = 31 . I don't always know how many steps are between the values as this DataFrame is already sorted by test_step .

If you simply want to split your DataFrame according to the step_ID , you can use the groupby method:

df_list = [x for _, x in df.groupby("step_ID")]

The variable df_list will store a list with the generated data frames. Each data frame will be associated with a step_ID value.

Now, to save only the last row, you can iterate through df_list and keep only the last row of each DataFrame :

last_rows = [d.iloc[-1] for d in df_list]

The variable last_rows will store a list of Series objects, each of which represents the last row of a DataFrame in df_list .


Edit:

A cleaner way to save only the last rows is to use the method pointed out by @Rick M:

df[(df.step_ID != df.step_ID.shift(-1))].copy()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM