删除重复的熊猫 df

Question

Trying use the DataFrame.drop_duplicates parameters but without luck as the duplicates are not being removed.尝试使用 DataFrame.drop_duplicates 参数但没有运气，因为重复项没有被删除。

Looking to remove based on column "inc_id".希望根据列“inc_id”删除。 If find duplicates in that column should keep only the last row.如果在该列中找到重复项，则应仅保留最后一行。

My df is:我的 df 是：

    inc_id  inc_cr_date
0   1049670 121
1   1049670 55
2   1049667 121
3   1049640 89
4   1049666 12
5   1049666 25

Output should be:输出应该是：

    inc_id  inc_cr_date
0   1049670 55
1   1049667 121
2   1049640 89
3   1049666 25

Code is:代码是：

df = df.drop_duplicates(subset='inc_id', keep="last")

Any idea what am I missing here?知道我在这里缺少什么吗？ Thanks.谢谢。

Answer 1

I think you are just looking to drop the original index :我认为您只是想删除原始索引：

In [11]: df.drop_duplicates(subset='inc_id', keep="last").reset_index(drop=True)
Out[11]:
    inc_id  inc_cr_date
0  1049670           55
1  1049667          121
2  1049640           89
3  1049666           25

Answer 2

For dataframe df, duplicate rows can be dropped using this code.对于数据帧 df，可以使用此代码删除重复的行。

df = pd.read_csv('./data/data-set.csv')
print(df['text'])

def clean_data(dataframe):
    # Drop duplicate rows
    dataframe.drop_duplicates(subset='text', inplace=True)

clean_data(df)
print(df['text'])

Answer 3

f.drop_duplicates(subset='inc_id', keep="last").reset_index(drop=True)

删除重复的熊猫 df

问题描述

3 个解决方案

解决方案1
4 已采纳 2017-11-09 16:44:53

解决方案2
1 2021-01-21 09:42:21

解决方案3
-1 2018-12-17 22:23:16

删除重复的熊猫 df

问题描述

3 个解决方案

解决方案1 4 已采纳 2017-11-09 16:44:53

解决方案2 1 2021-01-21 09:42:21

解决方案3 -1 2018-12-17 22:23:16

解决方案1
4 已采纳 2017-11-09 16:44:53

解决方案2
1 2021-01-21 09:42:21

解决方案3
-1 2018-12-17 22:23:16