如何在不删除 NaN 值的情况下删除 pandas 中的重复项

Question

I have a dataframe which I query and I want to get only unique values out of a certain column.我有一个 dataframe 查询，我只想从某个列中获取唯一值。
I tried to do that executing this code:我尝试执行此代码：

    database = pd.read_csv(db_file, sep='\t')
    query = database.loc[database[db_specifications[0]].isin(elements)].drop_duplicates(subset=db_specification[1])

db_specification is just a list containing two columns that I query. db_specification只是一个包含我查询的两列的列表。
Some of the values are NaN and I don't want to consider them duplicates of each other, how can I achieve that?有些值是NaN ，我不想认为它们是重复的，我该如何实现呢？

Answer 1

You can start by selecting all NaN and then drop duplicate on the rest of the dataframe.您可以首先选择所有NaN ，然后在 dataframe 的 rest 上删除重复项。

mask = data.isna().any()
data = pd.concat([data[mask], data[~mask]])

如何在不删除 NaN 值的情况下删除 pandas 中的重复项

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-08-13 11:26:08

如何在不删除 NaN 值的情况下删除 pandas 中的重复项

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-08-13 11:26:08

解决方案1
1 已采纳 2020-08-13 11:26:08