什么是最快的算法：在字符串列表中，删除作为另一个字符串的子字符串的所有字符串 [Python（或其他语言）]

Question

有一个字符串列表，例如 ["abc", "ab", "ad", "cde", "cde", "de", "def"] 我希望 output 是 ["abc", "广告”、“cde”、“def”]

“ab”被删除，因为它是“abc”的 substring “cde”被删除，因为它是另一个“cde”的 substring “de”被删除，因为它是“def”的 substring

最快的算法是什么？

我有一个蛮力方法，即 O(n^2) 如下：

def keep_long_str(str_list):
    str_list.sort(key = lambda x: -len(x))
    cleaned_str_list = []
    for element in str_list:
        element = element.lower()
        keep_element = 1
        for cleaned_element in cleaned_str_list:
            if element in cleaned_element:
                keep_element = 0
                break
            else:
                keep_element = 1
        if keep_element:
            cleaned_str_list.append(element)
    return cleaned_str_list

Answer 1

strings = ["abc", "ab", "ad", "cde", "cde", "de", "def"]
unique_strings = []

for s in strings: 
     if all(s not in uniq for uniq in unique_strings):
         unique_strings.append(s)

运行此代码后， unique_strings等于['abc', 'cde', 'def', 'ad'] 。

注意：这可能不是最快的方法，但它是一个简单的解决方案。

Answer 2

我查看了 Jack Moody 和 Chris Charley 的答案，但仍然不喜欢在第一次出现超弦时使用all when any may break out the loop，所以想出了这个改动：

strings = ["abc", "ab", "ad", "cde", "cde", "de", "def"]
unique_strings = []
for s in sorted(strings, reverse=True):  # Largest first 
    if not any(s in uniq for uniq in unique_strings):
        unique_strings.append(s)
print(unique_strings)  # ['def', 'cde', 'ad', 'abc']

我认为不需要对字符串len进行明确排序，因为无论如何它都是字符串比较的一部分。 干杯：-）

什么是最快的算法：在字符串列表中，删除作为另一个字符串的子字符串的所有字符串 [Python（或其他语言）]

问题描述

2 个解决方案

解决方案1
1 已采纳 2020-04-16 22:59:27

解决方案2
0 2020-04-17 16:22:48

什么是最快的算法：在字符串列表中，删除作为另一个字符串的子字符串的所有字符串 [Python（或其他语言）]

问题描述

2 个解决方案

解决方案1 1 已采纳 2020-04-16 22:59:27

解决方案2 0 2020-04-17 16:22:48

解决方案1
1 已采纳 2020-04-16 22:59:27

解决方案2
0 2020-04-17 16:22:48