简体   繁体   English

如何根据列中的值删除重复的行

[英]How to remove duplicated rows based on values in column

Good morning guys.早上好家伙。 So my problem is to remove duplicates from dataframe caused by many diffrent values in one of columns.所以我的问题是从 dataframe 中删除由其中一列中的许多不同值引起的重复项。

The base dataframe looks like this below:基础 dataframe 如下所示:

在此处输入图像描述

As You can see, I have duplicated values in columns Name and Id depends on Category.如您所见,我在 Name 和 Id 列中有重复的值取决于类别。 Our goal is to remove those duplicates while keeping the information about category.我们的目标是删除这些重复项,同时保留有关类别的信息。

I would like to have the exact view as here below:我想有如下的确切视图:

在此处输入图像描述

I have tried to use get_dummies method from pandas library but i have some issues.我尝试使用 pandas 库中的 get_dummies 方法,但我有一些问题。

dummies = pd.get_dummies(df[['Category']], drop_first=True)
df = pd.concat([df.drop(['Category'], axis=1), dummies], axis=1)

Using the code above i'm getting the result like this below:使用上面的代码,我得到如下结果:

在此处输入图像描述

The result is basicly still the same as base dataframe.结果基本还是和base dataframe一样。

Do You guys have any idea how to deal with it?你们知道如何处理它吗?

It depends what need - if possible duplicates per Name and Id is necessary aggregate max :这取决于需要什么 - 如果可能的话,每个NameId重复是必要的聚合max

df = (pd.get_dummies(df, columns=['Category'])
        .groupby(['Name','Id'], as_index=False)
         .max())
print (df)
  Name  Id  Category_A  Category_B  Category_C
0  ABC   1           1           0           0
1  ABC   2           0           1           0
2  DEF   2           1           0           0
3  GHI   3           0           0           1
4  JKL   4           0           0           1
5  MNO   5           1           0           0

If need aggregate per Id with last value for non numeric values use:如果需要聚合每个Id以及非数值的最后一个值,请使用:

f = lambda x: x.max() if np.issubdtype(x.dtype, np.number) else x.iat[-1]
df = (pd.get_dummies(df, columns=['Category'])
        .groupby('Id', as_index=False)
         .agg(f))
print (df)
   Id Name  Category_A  Category_B  Category_C
0   1  ABC           1           0           0
1   2  DEF           1           1           0
2   3  GHI           0           0           1
3   4  JKL           0           0           1
4   5  MNO           1           0           0

In second solution is possible specify columns for aggregations:在第二种解决方案中,可以为聚合指定列:

# f = lambda x: x.max() if np.issubdtype(x.dtype, np.number) else x.iat[-1]
df = pd.get_dummies(df, columns=['Category'])
         
d = dict.fromkeys(df.columns, 'max')
d['Name'] = 'last'
print (d)
{'Name': 'last', 'Id': 'max', 'Category_A': 'max', 'Category_B': 'max', 'Category_C': 'max'}

df = df.groupby('Id', as_index=False).agg(d)
print (df)
  Name  Id  Category_A  Category_B  Category_C
0  ABC   1           1           0           0
1  DEF   2           1           1           0
2  GHI   3           0           0           1
3  JKL   4           0           0           1
4  MNO   5           1           0           0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 处理pandas python中的缺失值后,如何检查重复行(基于一列)是否相同? - How to check that duplicated rows (based on one column) are identical after dealing with missing values in pandas python? 如何提取指定列值组合重复的数据帧的行? - How to extract the rows of a dataframe where a combination of specified column values are duplicated? 如何基于公共列但值重复来合并两个数据框? - How to merge two dataframes based on a common column but duplicated values? Postgresql 通过重复的列值删除行 - Postgresql remove rows by duplicated column value Pandas:根据列表中的重复值删除行 - Pandas: drop rows based on duplicated values in a list 将具有重复值的行合并到相应列中的列表中 - Merge rows with duplicated values into list in corresponding column 根据ID删除数据框中列中的重复值 - Remove duplicated values in a column in dataframe according to ID 根据重复列中的条件填写缺失值 - Fill missing values based on condition in duplicated column 根据列值删除Pandas中的DataFrame行 - 要删除的多个值 - Deleting DataFrame rows in Pandas based on column value - multiple values to remove 有没有办法根据另一列中的值删除训练集中的某些行 - Is there a way to remove some rows in the training set based on values in another column
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM