如何刪除重復 pandas

Question

我需要使用 Pandas 檢查 dataframe 的一列中是否有一些重復值，如果有任何重復，請刪除整行。 我只需要檢查第一列。

例子：

object    type

apple     fruit
ball      toy
banana    fruit
xbox      videogame
banana    fruit
apple     fruit

我需要的是：

object    type

apple     fruit
ball      toy
banana    fruit
xbox      videogame

我可以使用以下代碼刪除“對象”重復項，但我無法刪除包含重復項的整行，因為不會刪除第二列。


df = pd.read_csv(directory, header=None,)

objects= df[0]

for object in df[0]:

Answer 1

Select 通過重復的掩碼和否定它

df = df[~df["object"].duplicated()]

這使

   object       type
0   apple      fruit
1    ball        toy
2  banana      fruit
3    xbox  videogame

Answer 2

使用drop_duplicates方法

d = pd.DataFrame(
    {'object': ['apple', 'ball', 'banana', 'xbox', 'banana', 'apple'],
    'type': ['fruit', 'toy', 'fruit', 'videogame', 'fruit', 'fruit']}
)
d.drop_duplicates()

有幾個關鍵字參數。 這可能會派上用場（比如 inplace inplace=True如果你想更新你的 dataframe d ）

Answer 3

您可以使用帶有參數subset='object' .drop_duplicates()到select要檢查的列，如下：

df_out = df.drop_duplicates(subset='object')

結果：

print(df_out)

   object       type
0   apple      fruit
1    ball        toy
2  banana      fruit
3    xbox  videogame

Answer 4

刪除重復項后獲取長度

df = len(df)-len(df.drop_duplicates())

如何刪除重復 pandas

問題描述

4 個解決方案

解決方案1
0 已采納 2021-06-15 15:43:14

解決方案2
0 2021-06-15 15:45:49

解決方案3
0 2021-06-15 15:47:15

解決方案4
0 2022-09-15 04:55:13

如何刪除重復 pandas

問題描述

4 個解決方案

解決方案1 0 已采納 2021-06-15 15:43:14

解決方案2 0 2021-06-15 15:45:49

解決方案3 0 2021-06-15 15:47:15

解決方案4 0 2022-09-15 04:55:13

解決方案1
0 已采納 2021-06-15 15:43:14

解決方案2
0 2021-06-15 15:45:49

解決方案3
0 2021-06-15 15:47:15

解決方案4
0 2022-09-15 04:55:13