[英]What is the fastest algorithm: in a string list, remove all the strings which are substrings of another string [Python (or other language)]
There is a string list, for example ["abc", "ab", "ad", "cde", "cde", "de", "def"] I would like the output to be ["abc", "ad", "cde", "def"]有一个字符串列表,例如 ["abc", "ab", "ad", "cde", "cde", "de", "def"] 我希望 output 是 ["abc", "广告”、“cde”、“def”]
"ab" was removed because it is the substring of "abc" "cde" was removed because it is the substring of another "cde" "de" was removed because it is the substring of "def" “ab”被删除,因为它是“abc”的 substring “cde”被删除,因为它是另一个“cde”的 substring “de”被删除,因为它是“def”的 substring
What is the fastest algorithm?最快的算法是什么?
I have a brute-force method, which is O(n^2) as follows:我有一个蛮力方法,即 O(n^2) 如下:
def keep_long_str(str_list):
str_list.sort(key = lambda x: -len(x))
cleaned_str_list = []
for element in str_list:
element = element.lower()
keep_element = 1
for cleaned_element in cleaned_str_list:
if element in cleaned_element:
keep_element = 0
break
else:
keep_element = 1
if keep_element:
cleaned_str_list.append(element)
return cleaned_str_list
strings = ["abc", "ab", "ad", "cde", "cde", "de", "def"]
unique_strings = []
for s in strings:
if all(s not in uniq for uniq in unique_strings):
unique_strings.append(s)
After running this code, unique_strings
equals ['abc', 'cde', 'def', 'ad']
.运行此代码后, unique_strings
等于['abc', 'cde', 'def', 'ad']
。
Note: This is probably not the fastest way to do this, but it is a simple solution.注意:这可能不是最快的方法,但它是一个简单的解决方案。
I looked at the answer by Jack Moody and Chris Charley and still didn't like the use of all
when any
could break out of the loop on the first occurrence of a super-string, so came up with this alteration:我查看了 Jack Moody 和 Chris Charley 的答案,但仍然不喜欢在第一次出现超弦时使用all
when any
may break out the loop,所以想出了这个改动:
strings = ["abc", "ab", "ad", "cde", "cde", "de", "def"]
unique_strings = []
for s in sorted(strings, reverse=True): # Largest first
if not any(s in uniq for uniq in unique_strings):
unique_strings.append(s)
print(unique_strings) # ['def', 'cde', 'ad', 'abc']
I didn't think there was a need to sort explicitely on string len
as it is part of string compares anyway.我认为不需要对字符串len
进行明确排序,因为无论如何它都是字符串比较的一部分。 Cheers:-)干杯:-)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.