简体   繁体   中英

What is the fastest algorithm: in a string list, remove all the strings which are substrings of another string [Python (or other language)]

There is a string list, for example ["abc", "ab", "ad", "cde", "cde", "de", "def"] I would like the output to be ["abc", "ad", "cde", "def"]

"ab" was removed because it is the substring of "abc" "cde" was removed because it is the substring of another "cde" "de" was removed because it is the substring of "def"

What is the fastest algorithm?

I have a brute-force method, which is O(n^2) as follows:

def keep_long_str(str_list):
    str_list.sort(key = lambda x: -len(x))
    cleaned_str_list = []
    for element in str_list:
        element = element.lower()
        keep_element = 1
        for cleaned_element in cleaned_str_list:
            if element in cleaned_element:
                keep_element = 0
                break
            else:
                keep_element = 1
        if keep_element:
            cleaned_str_list.append(element)
    return cleaned_str_list
strings = ["abc", "ab", "ad", "cde", "cde", "de", "def"]
unique_strings = []

for s in strings: 
     if all(s not in uniq for uniq in unique_strings):
         unique_strings.append(s)

After running this code, unique_strings equals ['abc', 'cde', 'def', 'ad'] .

Note: This is probably not the fastest way to do this, but it is a simple solution.

I looked at the answer by Jack Moody and Chris Charley and still didn't like the use of all when any could break out of the loop on the first occurrence of a super-string, so came up with this alteration:

strings = ["abc", "ab", "ad", "cde", "cde", "de", "def"]
unique_strings = []
for s in sorted(strings, reverse=True):  # Largest first 
    if not any(s in uniq for uniq in unique_strings):
        unique_strings.append(s)
print(unique_strings)  # ['def', 'cde', 'ad', 'abc']

I didn't think there was a need to sort explicitely on string len as it is part of string compares anyway. Cheers:-)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM