如何在Pandas DF列中对值进行排序并删除重复项

Question

This is probably a very basic question but I haven't been able to find the answer so here goes... 这可能是一个非常基本的问题，但我找不到答案，所以这里......

Question: 题：

Is there an say way to sort the values alphabetically while also removing any duplicate instances? 有没有说法按字母顺序排序值，同时还删除任何重复的实例？

Here's what I have: 这就是我所拥有的：

data = ['Car | Book | Apple','','Book | Car | Apple | Apple']
df = pd.DataFrame(data,columns=['Labels']
print(df)

    Labels
0   Car | Book | Apple
1   
2   Book | Car | Apple | Apple

Desired Output: 期望的输出：

    Labels
0   Apple | Book | Car
1   
2   Apple | Book | Car

Thanks! 谢谢！

Answer 1

str.join after str.split str.join之后的str.split

df=df.replace({' ':''},regex=True)
df.Labels.str.split('|').apply(set).str.join('|')
Out[339]: 
0    Apple|Book|Car
1                  
2    Apple|Book|Car
Name: Labels, dtype: object

Base on the comment adding sorted 根据评论添加sorted

df.Labels.str.split('|').apply(lambda x : sorted(set(x),reverse=False)).str.join(' | ')

Answer 2

One way is to use pd.Series.map with sorted & set after splitting by | 一种方法是在使用|分割后使用带有sorted和set pd.Series.map : ：

import pandas as pd

data = ['Car | Book | Apple','','Book | Car | Apple | Apple']
df = pd.DataFrame(data,columns=['Labels'])

df['Labels'] = df['Labels'].map(lambda x: ' | '.join(sorted(set(x.split(' | ')))))

#                Labels
# 0  Apple | Book | Car
# 1                    
# 2  Apple | Book | Car

Answer 3

df['Labels'].str.split('|') will split the string on | df['Labels'].str.split('|')将字符串拆分为| and return a list 并返回一个列表

#0             [Car ,  Book ,  Apple]
#1                                 []
#2    [Book ,  Car ,  Apple ,  Apple]
#Name: Labels, dtype: object

See that there are extra spaces in the resulting list elements. 看到结果列表元素中有多余的空格。 One way to remove those is by applying str.strip() to each element in the list: 删除它们的一种方法是将str.strip()应用于列表中的每个元素：

df['Labels'].str.split('|').apply(lambda x: map(str.strip, x))
#0           [Car, Book, Apple]
#1                           []
#2    [Book, Car, Apple, Apple]
#Name: Labels, dtype: object

Finally we apply the set constructor to remove duplicates, sort the values, and join them back together using " | " as a separator: 最后，我们应用set构造函数来删除重复项，对值进行排序，并使用" | "作为分隔符将它们连接在一起：

df['Labels'] = df['Labels'].str.split('|').apply(
    lambda x: " | ".join(sorted(set(map(str.strip, x))))
)
print(df)
#               Labels
#0  Apple | Book | Car
#1                    
#2  Apple | Book | Car

如何在Pandas DF列中对值进行排序并删除重复项

问题描述

3 个解决方案

解决方案1
3 已采纳 2018-03-16 18:10:58

解决方案2
3 2018-03-16 18:21:27

解决方案3
2 2018-03-16 18:03:57

如何在Pandas DF列中对值进行排序并删除重复项

问题描述

3 个解决方案

解决方案1 3 已采纳 2018-03-16 18:10:58

解决方案2 3 2018-03-16 18:21:27

解决方案3 2 2018-03-16 18:03:57

解决方案1
3 已采纳 2018-03-16 18:10:58

解决方案2
3 2018-03-16 18:21:27

解决方案3
2 2018-03-16 18:03:57