[英]Remove the similar Duplicates from list of strings
I'm trying to remove the similar duplicates from my list.我正在尝试从我的列表中删除类似的重复项。 Here is my code:这是我的代码:
l = ["shirt", "shirt", "shirt len", "pant", "pant cotton", "len pant", "watch"]
res = [*set(l)]
print(res)
This will Remove only shirt word which is actually duplicate, but I'm looking to remove the similar words to remove like shirt Len,pant cotton,Len pant.这将仅删除实际上重复的衬衫字词,但我希望删除类似的词以删除衬衫 Len、pant cotton、Len pant。 Like that.像那样。
Expecting Output as Shirt,pant,watch期待 Output 作为衬衫,裤子,手表
It sounds like you want to check if the single-word strings are in any other string, and if so remove them as a duplicate.听起来您想检查单个单词字符串是否在任何其他字符串中,如果是,则将它们作为重复项删除。 I would go about it this way:我会这样 go :
l = ["shirt", "shirt", "shirt len", "pant", "pant cotton", "len pant", "watch"]
single, longer = set(), set()
for s in l:
if len(s.split()) == 1:
single.add(s)
else:
longer.add(s)
res = set()
for s in longer:
if not any(word in s for word in single):
res.add(s)
res |= single
print(res)
This example will give:这个例子将给出:
{'shirt', 'watch', 'pant'}
You can try something like below:您可以尝试以下操作:
by selecting single word element from list and then apply set通过从列表中选择单个单词元素然后应用集合
lst = ["shirt", "shirt", "shirt len", "pant cotton", "len pant", "watch"]
set([ls for ls in lst if ' 'not in ls])
#Output {'pant', 'shirt', 'watch'}
note if your input will ["shirt", "shirt", "shirt len", "pant cotton", "len pant", "watch"]
then output will be {'shirt', 'watch'}
请注意,如果您输入["shirt", "shirt", "shirt len", "pant cotton", "len pant", "watch"]
那么 output 将是{'shirt', 'watch'}
and if still would like to add pant, cotton
then you can try如果还想加pant, cotton
那么你可以试试
set(sum([ls.split(' ') for ls in lst], []))
#output {'cotton', 'len', 'pant', 'shirt', 'watch'}
and later filter out word by conditions as per your requirements然后根据您的要求按条件过滤掉单词
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.