简体   繁体   English

是否在不更改列表原始顺序的情况下删除列表中其他字符串的子字符串的字符串?

[英]Remove a string that is a substring of other string in the list WITHOUT changing original order of the list?

I have a list. 我有一个清单。

the_list = ['Donald Trump has', 'Donald Trump has small fingers', 'What is going on?']

I'd like to remove "Donald Trump has" from the_list because it's a substring of other list element. 我想从the_list删除“ Donald Trump has”,因为它是其他list元素的子字符串。

Here is an important part. 这是重要的部分。 I want to do this without distoring the order of the original list. 我想这样做而不会扭曲原始列表的顺序。

The function I have (below) distorts the order of the original list. 我具有的功能(如下)会扭曲原始列表的顺序。 Because it sorts the list items by its length first. 因为它首先按其长度对列表项进行排序。

def substr_sieve(list_of_strings):  
    dups_removed = list_of_strings[:]
    for i in xrange(len(list_of_strings)):
        list_of_strings.sort(key = lambda s: len(s))
        j=0
        j=i+1
        while j <= len(list_of_strings)-1:
            if list_of_strings[i] in list_of_strings[j]:
                try:
                    dups_removed.remove(list_of_strings[i])
                except:
                    pass
            j+=1
    return dups_removed

A simple solution. 一个简单的解决方案。

But first, let's also add ' Donald Trump ', 'Donald' and 'Trump' in the end to make it a better test case. 但是首先,我们最后还要添加“ Donald Trump ”, “ Donald”“ Trump” ,以使其成为更好的测试用例。

>>> forbidden_text = "\nX08y6\n" # choose a text that will hardly appear in any sensible string
>>> the_list = ['Donald Trump has', 'Donald Trump has small fingers', 'What is going on?',
        'Donald Trump', 'Donald', 'Trump']
>>> new_list = [item for item in the_list if forbidden_text.join(the_list).count(item) == 1]
>>> new_list
['Donald Trump has small fingers', 'What is going on?']

Logic: 逻辑:

  1. Concatenate all list element to form a single string. 连接所有列表元素以形成单个字符串。 forbidden_text.join(the_list) . forbidden_text.join(the_list)
  2. Search if an item in the list has occurred only once. 搜索列表中的项目是否仅发生过一次。 If it occurs more than once it is a sub-string. 如果多次出现,则为子字符串。 count(item) == 1

str.count(sub[, start[, end]]) str.count(sub [,start [,end]])

Return the number of non-overlapping occurrences of substring sub in the range [start, end] . 返回范围为[start, end]的子字符串sub的不重叠出现的次数。 Optional arguments start and end are interpreted as in slice notation. 可选参数startend解释为切片表示法。


forbidden_text is used instead of "" (blank string), to handle a case like these : forbidden_text代替"" (空白字符串)来处理以下情况:

>>> the_list = ['DonaldTrump', 'Donald', 'Trump']


As correctly pointed by Nishant, above code fails for the_list = ['Donald', 'Donald'] 正如Nishant所正确指出的,上述代码对于the_list = ['Donald', 'Donald']失败

Using a set(the_list) instead of the_list solves the problem. 使用set(the_list)代替the_list解决了该问题。
>>> new_list = [item for item in the_list if forbidden_text.join(set(the_list)).count(item) == 1]

You can do this without sorting: 您可以执行此操作而无需排序:

the_list = ['Donald Trump has', "I've heard Donald Trump has small fingers",
            'What is going on?']

def winnow(a_list):
    keep = set()
    for item in a_list:
        if not any(item in other for other in a_list if item != other):
            keep.add(item)
    return [ item for item in a_list if item in keep ]

winnow(the_list)

Sorting may allow fewer comparisons overall, but that seems highly data-dependent, and could be a premature optimization. 排序可能总体上允许较少的比较,但这似乎与数据高度相关,并且可能是过早的优化。

You can just recursively reduce the items. 您可以递归地减少项目。

Algorithm: 算法:

Loop over each item by popping it, decide if it needs to be kept or not. 通过弹出每个项目来循环遍历,确定是否需要保留。 Call the same function recursively with the reduced list. 用精简列表递归调用相同的函数。 Base condition is if the list has at-least one item (or two?). 基本条件是列表中至少有一项(或两项?)。

Efficiency: It might not be the most efficient. 效率:可能不是最有效的。 I think some Divide and Conquer methods would be more apt? 我认为一些分而治之的方法会更合适吗?

the_list = ['Donald Trump has', 'Donald Trump has small fingers',\
            'What is going on?']

final_list = []

def remove_or_append(input):
    if len(input):
        first_value = input.pop(0)
        found = False
        for each in input:
            if first_value in each:
                found = True
                break
            else:
                continue
        for each in final_list:
            if first_value in each:
                found = True
                break
            else:
                continue
        if not found:
            final_list.append(first_value)
        remove_or_append(input)

remove_or_append(the_list)

print(final_list)

A slightly different version is: 稍有不同的版本是:

def substring_of_anything_else(item, list):
    for idx, each in enumerate(list):
        if idx == item[0]:
            continue
        else:
            if item[1] in each:
                return True
        return False

final_list = [item for idx, item in enumerate(the_list)\ 
              if not substring_of_anything_else((idx, item), the_list)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如果从子字符串列表中删除列表中的字符串 - Remove string from list if from substring list 如何比较 2 个列表并从 1 个列表中删除包含其他列表中的 substring 的字符串? Python - How can I compare 2 list and remove the a string from 1 list that contain a substring from other list? Python 将列表更改为字符串以删除字符 - Changing list to string to remove characters Python - 从列表中的字符串元素中删除子字符串? - Python - Remove substring from string element in a list? 使用 Pandas 从字符串列表中删除 substring - Remove a substring from a list of string using Pandas 如果列表中的字符串都在另一个字符串的子字符串中,则删除列表中的字符串 - Remove string in list that is substring of another string if both are in the list Python从其他字符串列表中计算列表中的子字符串数,不重复 - Python Count the number of substring in list from other string list without duplicates 如果 substring 在数据框列的列表中,则从字符串中删除 substring - Remove substring from string if substring in list in data frame column 在字符串列表中,如何删除列表中其他字符串的一部分? - In a list of string, how to remove string which are part of other string in the list? 在保留原始列表顺序的同时拆分列表中的字符串 - Split String In List While Preserving Original List Order
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM