从列表格式的数据框列中删除重复项

Question

I have ton of duplicate values in a data frame column by row. 我在数据帧中逐行有大量重复值。 Below is some sample, I looked at other stack overflow question, but I can only find the answer for the list not for the data frame issue dupes. 下面是一些示例，我查看了其他堆栈溢出问题，但是我只能找到列表的答案，而不是数据帧问题重复的答案。 When I pass values in a list, I am able to remove duplicate values however, when I pass it like a data frame it is giving errors: TypeError: unhashable type: 'list' 当我在列表中传递值时，我可以删除重复的值，但是，当我像数据框一样传递它时，它会给出错误： TypeError: unhashable type: 'list'

What am I doing wrong here? 我在这里做错了什么？

import pandas as pd 
d = {'col1': ['apples are delicious,apples are delicious,apples', 'apples'], 'col2': ['mangoes','oranges']}
df = pd.DataFrame(data=d)
df['col1'] = set(df['col1'].str.split(",")) #error tried list(set()) as well.
df['col2'] = df['col2'].str.split(",") #converting to list
print(df)

final output should remove dupes like this: 最终输出应删除重复项，如下所示：

col1                                         co2
['apples are delicious','apples']            ['mangoes']
['apples']                                   ['oranges']

Answer 1

You are using set on an entire series, whereas you need to apply set to each element in the series. 您正在整个系列上使用set ，而您需要将set应用于set中的每个元素 。 For this, you can use pd.Series.map : 为此，您可以使用pd.Series.map ：

df['col1'] = df['col1'].str.split(',').map(set)

print(df)

                             col1       col2
0  {apples are delicious, apples}  [mangoes]
1                        {apples}  [oranges]

Your error derives from the fact you can't have a set of lists since lists are not hashable. 您的错误源于以下事实：由于列表不可哈希，因此您无法拥有一set列表。

If you really need a series of lists as the result, you can use the same method again, ie df['col1'].str.split(',').map(set).map(list) . 如果确实需要一系列列表作为结果，则可以再次使用相同的方法，即df['col1'].str.split(',').map(set).map(list) 。 But note that you should assume no ordering as set is an unordered collection. 但是请注意，您不应该假设set中的任何排序都是无序集合。

从列表格式的数据框列中删除重复项

问题描述

1 个解决方案

解决方案1
2 已采纳 2018-08-21 13:38:34

从列表格式的数据框列中删除重复项

问题描述

1 个解决方案

解决方案1 2 已采纳 2018-08-21 13:38:34

解决方案1
2 已采纳 2018-08-21 13:38:34