How to drop duplicates in python if consecutive values are the same in two columns?

Question

I have a dataframe like below:

I want to drop duplicates where keep the first value in the consecutive occurence if the C is also the same. EG here occurence '9' is column B is repetitive and their correponding occurences in column 'C' is also repetitive '45'. In this case i want to retain the first occurence.

Expected Output:

I tried some group by, but didnot know how to drop.

code:

df['consecutive'] = (df['B'] != df['B'].shift(1)).cumsum()
test=df.groupby('consecutive',as_index=False).apply(lambda x: (x['B'].head(1),x.shape[0],
                                                       x['C'].iloc[-1] - x['C'].iloc[0]))

This group by returns me a series, but i want to drop.

Answer 1

Add DataFrame.drop_duplicates by 2 columns:

df['consecutive'] = (df['B'] != df['B'].shift(1)).cumsum()
df = df.drop_duplicates(['consecutive','C'])
print (df)
   A   B   C  consecutive
0  1   8  23            1
1  2   8  22            1
2  3   9  45            2
4  5   6  12            3
5  6   4  10            4
6  7  11  12            5

Or chain both conditions with | for bitwise OR :

df = df[(df['B'] != df['B'].shift()) | (df['C'] != df['C'].shift())]
print (df)
   A   B   C
0  1   8  23
1  2   8  22
2  3   9  45
4  5   6  12
5  6   4  10
6  7  11  12

Answer 2

A oneliner to filter out such records is:

df[(df[['B', 'C']].shift() != df[['B', 'C']]).any(axis=1)]

Here we thus check if the columns ['B', 'C'] is the same as the shifted rows, if it is not, we retain the values:

>>> df[(df[['B', 'C']].shift() != df[['B', 'C']]).any(axis=1)]
   A   B   C
0  1   8  23
1  2   8  22
2  3   9  45
4  5   6  12
5  6   4  10
6  7  11  12

This is quite scalable, since we can define a function that will easily operate on an arbitrary number of values:

def drop_consecutive_duplicates(df, *colnames):
    dff = df[list(colnames)]
    return df[(dff.shift() != dff).any(axis=1)]

So you can then filter with:

drop_consecutive_duplicates(df, 'B', 'C')

Answer 3

You can compute a series of the rows to drop, and then drop them:

to_drop = (df['B'] == df['B'].shift())&(df['C']==df['C'].shift())
df = df[~to_drop]

It gives as expected:

   A   B   C
0  1   8  23
1  2   8  22
2  3   9  45
4  5   6  12
5  6   4  10
6  7  11  12

Answer 4

一种简单的方法来检查B和C行之间的差异，然后如果差异为0（重复值），则丢弃值，代码为

 df[ ~((df.B.diff()==0) & (df.C.diff()==0)) ]

Answer 5

Code

df1 = df.drop_duplicates(subset=['B', 'C'])

Result

   A   B   C
0  1   8  23
1  2   8  22
2  3   9  45
4  5   6  12
5  6   4  10
6  7  11  12

Answer 6

If I understand your question correctly, given the following dataframe:

df = pd.DataFrame({'B': [8, 8, 9, 9, 6, 4, 11], 'C': [22, 23, 45, 45, 12, 10, 12],})

This one-line code solved your problem using the drop_duplicates method:

df.drop_duplicates(['B', 'C'])

It gives as expected results:

Answer 7

Using diff , ne and any over axis=1 :

Note: this method only works for numeric columns

m = df[['B', 'C']].diff().ne(0).any(axis=1)
print(df[m])

Output

   A   B   C
0  1   8  23
1  2   8  22
2  3   9  45
4  5   6  12
5  6   4  10
6  7  11  12

Details

df[['B', 'C']].diff()

     B     C
0  NaN   NaN
1  0.0  -1.0
2  1.0  23.0
3  0.0   0.0
4 -3.0 -33.0
5 -2.0  -2.0
6  7.0   2.0

Then we check if any of the values in a row are not equal ( ne ) to 0 :

df[['B', 'C']].diff().ne(0).any(axis=1)

0     True
1     True
2     True
3    False
4     True
5     True
6     True
dtype: bool

How to drop duplicates in python if consecutive values are the same in two columns?

Question

7 answers

solution1
2 ACCPTED 2019-09-18 09:21:05

solution2
0 2019-09-18 09:25:20

solution3
0 2019-09-18 09:25:37

solution4
0 2019-09-18 09:27:06

solution5
0 2019-09-18 09:28:26

solution6
0 2019-09-18 09:37:08

solution7
0 2019-09-18 09:55:18

How to drop duplicates in python if consecutive values are the same in two columns?

Question

7 answers

solution1 2 ACCPTED 2019-09-18 09:21:05

solution2 0 2019-09-18 09:25:20

solution3 0 2019-09-18 09:25:37

solution4 0 2019-09-18 09:27:06

solution5 0 2019-09-18 09:28:26

solution6 0 2019-09-18 09:37:08

solution7 0 2019-09-18 09:55:18

solution1
2 ACCPTED 2019-09-18 09:21:05

solution2
0 2019-09-18 09:25:20

solution3
0 2019-09-18 09:25:37

solution4
0 2019-09-18 09:27:06

solution5
0 2019-09-18 09:28:26

solution6
0 2019-09-18 09:37:08

solution7
0 2019-09-18 09:55:18