简体   繁体   English

什么是最快的算法:在字符串列表中,删除作为另一个字符串的子字符串的所有字符串 [Python(或其他语言)]

[英]What is the fastest algorithm: in a string list, remove all the strings which are substrings of another string [Python (or other language)]

There is a string list, for example ["abc", "ab", "ad", "cde", "cde", "de", "def"] I would like the output to be ["abc", "ad", "cde", "def"]有一个字符串列表,例如 ["abc", "ab", "ad", "cde", "cde", "de", "def"] 我希望 output 是 ["abc", "广告”、“cde”、“def”]

"ab" was removed because it is the substring of "abc" "cde" was removed because it is the substring of another "cde" "de" was removed because it is the substring of "def" “ab”被删除,因为它是“abc”的 substring “cde”被删除,因为它是另一个“cde”的 substring “de”被删除,因为它是“def”的 substring

What is the fastest algorithm?最快的算法是什么?

I have a brute-force method, which is O(n^2) as follows:我有一个蛮力方法,即 O(n^2) 如下:

def keep_long_str(str_list):
    str_list.sort(key = lambda x: -len(x))
    cleaned_str_list = []
    for element in str_list:
        element = element.lower()
        keep_element = 1
        for cleaned_element in cleaned_str_list:
            if element in cleaned_element:
                keep_element = 0
                break
            else:
                keep_element = 1
        if keep_element:
            cleaned_str_list.append(element)
    return cleaned_str_list
strings = ["abc", "ab", "ad", "cde", "cde", "de", "def"]
unique_strings = []

for s in strings: 
     if all(s not in uniq for uniq in unique_strings):
         unique_strings.append(s)

After running this code, unique_strings equals ['abc', 'cde', 'def', 'ad'] .运行此代码后, unique_strings等于['abc', 'cde', 'def', 'ad']

Note: This is probably not the fastest way to do this, but it is a simple solution.注意:这可能不是最快的方法,但它是一个简单的解决方案。

I looked at the answer by Jack Moody and Chris Charley and still didn't like the use of all when any could break out of the loop on the first occurrence of a super-string, so came up with this alteration:我查看了 Jack Moody 和 Chris Charley 的答案,但仍然不喜欢在第一次出现超弦时使用all when any may break out the loop,所以想出了这个改动:

strings = ["abc", "ab", "ad", "cde", "cde", "de", "def"]
unique_strings = []
for s in sorted(strings, reverse=True):  # Largest first 
    if not any(s in uniq for uniq in unique_strings):
        unique_strings.append(s)
print(unique_strings)  # ['def', 'cde', 'ad', 'abc']

I didn't think there was a need to sort explicitely on string len as it is part of string compares anyway.我认为不需要对字符串len进行明确排序,因为无论如何它都是字符串比较的一部分。 Cheers:-)干杯:-)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何找到数组中的哪些字符串是python中另一个字符串的子字符串? - How to find which strings in an array are substrings to another string in python? 从python中的字符串中过滤一组子字符串的最快方法是什么? - What is the fastest way to filter a set of substrings from a string in python? 从Python中的字符串创建重叠子串列表的最快方法 - Fastest way to create a list of overlapping substrings from a string in Python 删除列表中所有属于python中列表其他元素子字符串的元素 - Remove all elements of a list that are substrings of other elements of the list in python 是否有快速算法来删除字符串中重复的子串? - Is there a fast algorithm to remove repeated substrings in a string? 使用 python 删除字符串中包含任何给定子字符串的所有单词 - Remove all words in a string that contain any given substrings using python python中搜索字符串和字符串列表之间最高百分比Levenshtein距离的最快方法是什么? - What is the fastest method in python of searching for the highest percent Levenshtein distance between a string and a list of strings? 从python中的一长串字符串中查找并删除一些子字符串 - find and remove some substrings from a long list of string in python 如何从 Python 中的给定字符串中删除子字符串列表? - How do I remove a list of substrings from a given string in Python? 检查字符串列表中的所有元素是否都在字符串中的最快方法 - fastest way to check if all elements of a list of strings is in a string
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM