![](/img/trans.png)
[英]Is there a way to remove duplicates in a list while keeping original order?
[英]How to remove the duplicates excel column values in python while keeping the original order?
我有一个数据框,其中的列值具有不同的重复项,我想从包含数千行的整个数据框中删除这些重复项。
excel 文件中的数据如下所示:
index ColumnA
0 6-1/2" CAT, SMELLS, BAD, XS, A-403 -316L, 4" CAT TAIL
1 5-1/2' DOG, ROUND HEAD, SLIM, 60 LB, A-182 dog
2 1/2" Pipe, W/VALVE, Broken sides - packaging open, PIPE, Like NEW
3 6" WEDDING RING, 1 ct, RF, 1/2" WIDE, Diamond MISC, Wedding Ring
4 5' Ladder, 50LB, new, 1/2' STEPS, 316L -, with packaging, 5' ladder
我试过了:
def removeduplicates(str):
t=""
for t in str:
if(i in t):
pass
else:
t =t+1
还
df['columnA'].apply(lambda cell: set([c.strip() for c in cell.strip(', ')]))
但是这两种方法都不适用于这种情况。
所需输出:
ColumnA
6-1/2" CAT, SMELLS, BAD, XS, A-403 -316L, 4" TAIL
5-1/2' DOG, ROUND HEAD, SLIM, 60 LB, A-182
1/2" Pipe, W/VALVE, Broken sides - packaging open, Like NEW
6" WEDDING RING, 1 ct, RF, 1/2" WIDE, Diamond MISC
5' Ladder, 50LB, new, 1/2' STEPS, 316L -, with packaging
数据文件: https : //1drv.ms/x/s!ArCp0UbnlDoughmn3Io9aOvhNykZ?e=vZQXdC
我已经尝试删除重复项等。我不想删除行。 我不想删除列。 我已阅读此https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop_duplicates.html但找不到我的答案。
,
分隔。 内部列表是由空格分隔的每个短语的单词列表
. 然后,循环遍历每一行和每个列表列表以删除重复的单词(使用lower()
不区分大小写)。lst2.append([' '.join(sl1) for sl1 in lst])
。 当我们运行那行代码时,它再次将单词连接成短语,现在删除了重复的单词,所以现在我们只有一个列表列表,其中外部列表是行,内部列表是每个词的短语排。df['ColumnA'] = lst2
设置列表列表的列,然后使用,
将内部列表连接起来,
以将所有短语再次连接成一个字符串。 最后,使用.replace
对一些已删除的单词进行一些最终清理。df = pd.DataFrame({'index': {0: 0, 1: 1, 2: 2, 3: 3, 4: 4},
'ColumnA': {0: '6-1/2" CAT, SMELLS, BAD, XS, A-403 -316L, 4" CAT TAIL',
1: "5-1/2' DOG, ROUND HEAD, SLIM, 60 LB, A-182 dog",
2: '1/2" Pipe, W/VALVE, Broken sides - packaging open, PIPE, Like NEW',
3: '6" WEDDING RING, 1 ct, RF, 1/2" WIDE, Diamond MISC, Wedding Ring',
4: "5' Ladder, 50LB, new, 1/2' STEPS, 316L -, with packaging, 5' ladder"}})
df['ColumnA'] = (df['ColumnA'].str.split(', ').apply(lambda x: [y.split() for y in x]))
lst, lst2 = [], []
for i in df['ColumnA']:
for j in i: lst.append([k for k in j if k.lower()
not in [sl2.lower() for sl1 in lst for sl2 in sl1]])
lst2.append([' '.join(sl1) for sl1 in lst])
lst = []
df['ColumnA'] = lst2
df['ColumnA'] = df['ColumnA'].apply(lambda x: ', '.join(x)).str.replace(' , ', ' ').replace(', $','', regex=True)
df
Out[1]:
index ColumnA
0 0 6-1/2" CAT, SMELLS, BAD, XS, A-403 -316L, 4" TAIL
1 1 5-1/2' DOG, ROUND HEAD, SLIM, 60 LB, A-182
2 2 1/2" Pipe, W/VALVE, Broken sides - packaging open, Like NEW
3 3 6" WEDDING RING, 1 ct, RF, 1/2" WIDE, Diamond MISC
4 4 5' Ladder, 50LB, new, 1/2' STEPS, 316L -, with packaging
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.