什么是最快的算法：在字符串列表中，删除作为另一个字符串的子字符串的所有字符串 [Python（或其他语言）]

Question

There is a string list, for example ["abc", "ab", "ad", "cde", "cde", "de", "def"] I would like the output to be ["abc", "ad", "cde", "def"]有一个字符串列表，例如 ["abc", "ab", "ad", "cde", "cde", "de", "def"] 我希望 output 是 ["abc", "广告”、“cde”、“def”]

"ab" was removed because it is the substring of "abc" "cde" was removed because it is the substring of another "cde" "de" was removed because it is the substring of "def" “ab”被删除，因为它是“abc”的 substring “cde”被删除，因为它是另一个“cde”的 substring “de”被删除，因为它是“def”的 substring

What is the fastest algorithm?最快的算法是什么？

I have a brute-force method, which is O(n^2) as follows:我有一个蛮力方法，即 O(n^2) 如下：

def keep_long_str(str_list):
    str_list.sort(key = lambda x: -len(x))
    cleaned_str_list = []
    for element in str_list:
        element = element.lower()
        keep_element = 1
        for cleaned_element in cleaned_str_list:
            if element in cleaned_element:
                keep_element = 0
                break
            else:
                keep_element = 1
        if keep_element:
            cleaned_str_list.append(element)
    return cleaned_str_list

Answer 1

strings = ["abc", "ab", "ad", "cde", "cde", "de", "def"]
unique_strings = []

for s in strings: 
     if all(s not in uniq for uniq in unique_strings):
         unique_strings.append(s)

After running this code, unique_strings equals ['abc', 'cde', 'def', 'ad'] .运行此代码后， unique_strings等于['abc', 'cde', 'def', 'ad'] 。

Note: This is probably not the fastest way to do this, but it is a simple solution.注意：这可能不是最快的方法，但它是一个简单的解决方案。

Answer 2

I looked at the answer by Jack Moody and Chris Charley and still didn't like the use of all when any could break out of the loop on the first occurrence of a super-string, so came up with this alteration:我查看了 Jack Moody 和 Chris Charley 的答案，但仍然不喜欢在第一次出现超弦时使用all when any may break out the loop，所以想出了这个改动：

strings = ["abc", "ab", "ad", "cde", "cde", "de", "def"]
unique_strings = []
for s in sorted(strings, reverse=True):  # Largest first 
    if not any(s in uniq for uniq in unique_strings):
        unique_strings.append(s)
print(unique_strings)  # ['def', 'cde', 'ad', 'abc']

I didn't think there was a need to sort explicitely on string len as it is part of string compares anyway.我认为不需要对字符串len进行明确排序，因为无论如何它都是字符串比较的一部分。 Cheers:-)干杯：-）

什么是最快的算法：在字符串列表中，删除作为另一个字符串的子字符串的所有字符串 [Python（或其他语言）]

问题描述

2 个解决方案

解决方案1
1 已采纳 2020-04-16 22:59:27

解决方案2
0 2020-04-17 16:22:48

什么是最快的算法：在字符串列表中，删除作为另一个字符串的子字符串的所有字符串 [Python（或其他语言）]

问题描述

2 个解决方案

解决方案1 1 已采纳 2020-04-16 22:59:27

解决方案2 0 2020-04-17 16:22:48

解决方案1
1 已采纳 2020-04-16 22:59:27

解决方案2
0 2020-04-17 16:22:48