[英]How to find which strings in an array are substrings to another string in python?
[英]What is the fastest algorithm: in a string list, remove all the strings which are substrings of another string [Python (or other language)]
有一个字符串列表,例如 ["abc", "ab", "ad", "cde", "cde", "de", "def"] 我希望 output 是 ["abc", "广告”、“cde”、“def”]
“ab”被删除,因为它是“abc”的 substring “cde”被删除,因为它是另一个“cde”的 substring “de”被删除,因为它是“def”的 substring
最快的算法是什么?
我有一个蛮力方法,即 O(n^2) 如下:
def keep_long_str(str_list):
str_list.sort(key = lambda x: -len(x))
cleaned_str_list = []
for element in str_list:
element = element.lower()
keep_element = 1
for cleaned_element in cleaned_str_list:
if element in cleaned_element:
keep_element = 0
break
else:
keep_element = 1
if keep_element:
cleaned_str_list.append(element)
return cleaned_str_list
strings = ["abc", "ab", "ad", "cde", "cde", "de", "def"]
unique_strings = []
for s in strings:
if all(s not in uniq for uniq in unique_strings):
unique_strings.append(s)
运行此代码后, unique_strings
等于['abc', 'cde', 'def', 'ad']
。
注意:这可能不是最快的方法,但它是一个简单的解决方案。
我查看了 Jack Moody 和 Chris Charley 的答案,但仍然不喜欢在第一次出现超弦时使用all
when any
may break out the loop,所以想出了这个改动:
strings = ["abc", "ab", "ad", "cde", "cde", "de", "def"]
unique_strings = []
for s in sorted(strings, reverse=True): # Largest first
if not any(s in uniq for uniq in unique_strings):
unique_strings.append(s)
print(unique_strings) # ['def', 'cde', 'ad', 'abc']
我认为不需要对字符串len
进行明确排序,因为无论如何它都是字符串比较的一部分。 干杯:-)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.