Pandas DataFrame 丢弃连续重复项

Question

I want to modify drop_duplicates in a such way: For example, I've got DataFrame with rows:我想以这种方式修改 drop_duplicates：例如，我有 DataFrame 行：

| A header | Another header |
| -------- | -------------- |
| First    | el1            | 
| Second   | el2            |
| Second   | el8            |
| First    | el3            |
| Second   | el4            |
| Second   | el5            |
| First    | el6            |
| Second   | el9            |

And I need not to drop all duplicates, but only consecutive ones.而且我不需要删除所有重复项，而只需删除连续的重复项。 So as a result a want:因此，结果是：

| A header | Another header |
| -------- | -------------- |
| First    | el1            | 
| Second   | el2            |
| First    | el3            |
| Second   | el4            |
| First    | el6            |
| Second   | el9            |

Tried to do it with for, but maybe there are better ways尝试用 for 来做，但也许有更好的方法

Answer 1

You can simply do it by using shift() as follows:您可以简单地通过使用shift()来完成，如下所示：

import pandas as pd

df = pd.DataFrame({
    'A header': ['First', 'Second', 'Second', 'First', 'Second', 'Second', 'First', 'Second'],
    'Another header': ['el1', 'el2', 'el8', 'el3', 'el4', 'el5', 'el6', 'el9'],
})

print(df)
"""
  A header Another header
0    First            el1
1   Second            el2
2   Second            el8
3    First            el3
4   Second            el4
5   Second            el5
6    First            el6
7   Second            el9
"""

df2 = df[df['A header'] != df['A header'].shift(1)]

print(df2)
"""
  A header Another header
0    First            el1
1   Second            el2
3    First            el3
4   Second            el4
6    First            el6
7   Second            el9
"""

Using shift(1) , you can compare each row with the row's previous row.使用shift(1) ，您可以将每一行与该行的前一行进行比较。

For more information, see https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.shift.html有关详细信息，请参阅https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.shift.html

Answer 2

extract dup:提取副本：

l=[]
for i in range(len(df1)-1):
    if df1['A header'][i]==df1['A header'][i+1] :
        l.append(i+1)

drop dup:删除重复：

 df1.drop(l, inplace=True)

Pandas DataFrame 丢弃连续重复项

问题描述

2 个解决方案

解决方案1
1 2022-12-28 13:34:24

解决方案2
0 2022-12-28 13:18:49

Pandas DataFrame 丢弃连续重复项

问题描述

2 个解决方案

解决方案1 1 2022-12-28 13:34:24

解决方案2 0 2022-12-28 13:18:49

解决方案1
1 2022-12-28 13:34:24

解决方案2
0 2022-12-28 13:18:49