How to keep rows with same ID and consecutive values in Pandas?

Question

I think my problem is easy to understand but I dont know how to do it without loops in an efficient way.

My dataset (already sorted by ID and Value) has IDs, some Features and a value column (integer) my goal is to keep all consecutive values with the same ID from the first appearance and in case there is only one ID keep that one.

I think is easier to understand with an example so let me show you, my dataset looks like this:

d = {'Id': [1, 1, 1, 1, 2, 3, 3, 3], 'Feature': ['F1', 'F1', 'F1', 'F1', 'F2', 'F3', 'F3', 'F3'], 'Value': [1, 2, 4, 5, 2, 15, 16, 18]}
df = pd.DataFrame(data=d)

    Id  Feature   Value
0   1   F1        1
1   1   F1        2
2   1   F1        4
3   1   F1        5
4   2   F2        2
5   3   F3        15
6   3   F3        16
7   3   F3        18

Note: Duplicates are already dropped. Note2: Features are always the same for the same ID and could coincide with other IDs.

My goal would be to get this returned:

    Id  Feature   Value
0   1   F1        1
1   1   F1        2
4   2   F2        2
5   3   F3        15
6   3   F3        16

PS: Sorry in advance if any grammar mistakes, english is not my first language.

Answer 1

Use DataFrameGroupBy.diff with replace forst missing values per rows by 1 and compare for not equal 1 , use cumualtive sum by Series.cumsum , compare by 1 and filter in boolean indexing :

df = df[df.groupby('Id')['Value'].apply(lambda x: x.diff().ne(1).cumsum()).eq(1)]
print (df)
   Id Feature  Value
0   1      F1      1
1   1      F1      2
4   2      F2      2
5   3      F3     15
6   3      F3     16

How to keep rows with same ID and consecutive values in Pandas?

Question

1 answers

solution1
1 ACCPTED 2021-02-23 12:52:47

How to keep rows with same ID and consecutive values in Pandas?

Question

1 answers

solution1 1 ACCPTED 2021-02-23 12:52:47

solution1
1 ACCPTED 2021-02-23 12:52:47