[英]Pandas DataFrame drop consecutive duplicates
I want to modify drop_duplicates in a such way: For example, I've got DataFrame with rows:我想以这种方式修改 drop_duplicates:例如,我有 DataFrame 行:
| A header | Another header |
| -------- | -------------- |
| First | el1 |
| Second | el2 |
| Second | el8 |
| First | el3 |
| Second | el4 |
| Second | el5 |
| First | el6 |
| Second | el9 |
And I need not to drop all duplicates, but only consecutive ones.而且我不需要删除所有重复项,而只需删除连续的重复项。 So as a result a want:因此,结果是:
| A header | Another header |
| -------- | -------------- |
| First | el1 |
| Second | el2 |
| First | el3 |
| Second | el4 |
| First | el6 |
| Second | el9 |
Tried to do it with for, but maybe there are better ways尝试用 for 来做,但也许有更好的方法
You can simply do it by using shift()
as follows:您可以简单地通过使用shift()
来完成,如下所示:
import pandas as pd
df = pd.DataFrame({
'A header': ['First', 'Second', 'Second', 'First', 'Second', 'Second', 'First', 'Second'],
'Another header': ['el1', 'el2', 'el8', 'el3', 'el4', 'el5', 'el6', 'el9'],
})
print(df)
"""
A header Another header
0 First el1
1 Second el2
2 Second el8
3 First el3
4 Second el4
5 Second el5
6 First el6
7 Second el9
"""
df2 = df[df['A header'] != df['A header'].shift(1)]
print(df2)
"""
A header Another header
0 First el1
1 Second el2
3 First el3
4 Second el4
6 First el6
7 Second el9
"""
Using shift(1)
, you can compare each row with the row's previous row.使用shift(1)
,您可以将每一行与该行的前一行进行比较。
For more information, see https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.shift.html有关详细信息,请参阅https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.shift.html
extract dup:提取副本:
l=[]
for i in range(len(df1)-1):
if df1['A header'][i]==df1['A header'][i+1] :
l.append(i+1)
drop dup:删除重复:
df1.drop(l, inplace=True)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.