简体   繁体   中英

How to keep rows with same ID and consecutive values in Pandas?

I think my problem is easy to understand but I dont know how to do it without loops in an efficient way.

My dataset (already sorted by ID and Value) has IDs, some Features and a value column (integer) my goal is to keep all consecutive values with the same ID from the first appearance and in case there is only one ID keep that one.

I think is easier to understand with an example so let me show you, my dataset looks like this:

d = {'Id': [1, 1, 1, 1, 2, 3, 3, 3], 'Feature': ['F1', 'F1', 'F1', 'F1', 'F2', 'F3', 'F3', 'F3'], 'Value': [1, 2, 4, 5, 2, 15, 16, 18]}
df = pd.DataFrame(data=d)

    Id  Feature   Value
0   1   F1        1
1   1   F1        2
2   1   F1        4
3   1   F1        5
4   2   F2        2
5   3   F3        15
6   3   F3        16
7   3   F3        18

Note: Duplicates are already dropped. Note2: Features are always the same for the same ID and could coincide with other IDs.

My goal would be to get this returned:

    Id  Feature   Value
0   1   F1        1
1   1   F1        2
4   2   F2        2
5   3   F3        15
6   3   F3        16

PS: Sorry in advance if any grammar mistakes, english is not my first language.

Use DataFrameGroupBy.diff with replace forst missing values per rows by 1 and compare for not equal 1 , use cumualtive sum by Series.cumsum , compare by 1 and filter in boolean indexing :

df = df[df.groupby('Id')['Value'].apply(lambda x: x.diff().ne(1).cumsum()).eq(1)]
print (df)
   Id Feature  Value
0   1      F1      1
1   1      F1      2
4   2      F2      2
5   3      F3     15
6   3      F3     16

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM