Python：删除列表中至少由同一列表中的一个其他字符串包含的字符串

Question

I would love to filter my list of strings the following way: I want to exclude strings , if there is at least one other string in the same list that is " in " it .我想通过以下方式过滤我的字符串列表：如果同一列表中至少有一个其他字符串“在”它，我想排除 strings 。 Or to put this differently: I want to maintain strings, if there is no other string of the same list that is in it.或以不同的方式把这个：我想保持的字符串，如果是在它的同一列表的任何其他字符串。 Case Sensitivity should play a role here, if possible.如果可能，区分大小写应该在这里发挥作用。

To make this more clear, please find below an example :为了更清楚地说明这一点，请在下面找到一个示例：

My "first" list that contains every string:我的“第一个”列表包含每个字符串：

elements =["tree","TREE","treeforest","water","waterfall"]

After applying the solution, I would love to receive this list:应用解决方案后，我很想收到此列表：

elements = ["tree","TREE","water"]

For example: tree is in treeforest .例如： tree在treeforest 。 Thus, treeforest is excluded from my list.因此， treeforest被排除在我的列表之外。 Same applies for water and waterfall .同样适用于water和waterfall 。 However, tree , TREE and water should be maintained, because there are no others strings, that are " in " them.但是，应该维护tree ， TREE和water ，因为没有其他字符串“在”它们。

As I'd like to apply this to a " larger " list of strings, more efficient solutions are preferred.由于我想将此应用于“更大”的字符串列表，因此首选更有效的解决方案。

Hope this is understandable.希望这是可以理解的。 Thanks a lot in advance!!非常感谢提前！ Any help is highly appreciated.任何帮助都受到高度赞赏。

Answer 1

Quite optimized function with 2 loops, which saves a lot of loop iterations:相当优化的函数，带有 2 个循环，节省了大量的循环迭代：

def filterlist(l):
    # keep track of elements, which will be deleted
    deletelist = [False for _ in l]

    for i, el in enumerate(l):
        # already in deletelist, jump right to the next el
        if deletelist[i]:
            continue

        for j, el2 in enumerate(l):
            # comparing item to itself or el2 already in deletelist?
            # jump to next el2
            if i == j or deletelist[j]:
                continue

            # the comparison everyone expects
            if el in el2:
                deletelist[j] = True

            # also, check the other way around
            # will save loop iterations later
            elif el2 in el:
                deletelist[i] = True
                break # causes jump to next el

    # create new list, keep elements that are not in deletelist
    return [el for i, el in enumerate(l) if not deletelist[i]]

Usually built-in functions are faster, so let's compare it to Ed Ward's solution:通常内置函数更快，所以让我们将其与 Ed Ward 的解决方案进行比较：

# result of Ed Ward's solution using timeit:
100000 loops, best of 10: 5.38 usec per loop

# filterlist function with loops using timeit:
100000 loops, best of 10: 4.42 usec per loop

Interesting, but to get a really representative result, you should run timeit with a larger element list.有趣，但要获得真正具有代表性的结果，您应该使用更大的元素列表运行 timeit。

Answer 2

from copy import deepcopy

def remove_composite_words(e,elements):
  temp = [x for x in elements if e in x]
  temp = set(temp)
  elements = list(set(elements).difference(temp))
  return e,sorted(elements, key=len)

def keep_shortest_root(elements):
  elements = deepcopy(elements)
  elements = list(set(elements))
  elements = sorted(elements, key=len)
  if len(elements[0]) ==0:
    elements = elements[1:]

  results = []
  e = elements[0]
  while elements:
    e,elements = remove_composite_words(e,elements)
    results.append(e)
    if elements:
      e = elements[0]

  return results
  
elements =["tree","TREE","treeforest","water","waterfall",'forestTREE','tree']

keep_shortest_root(elements)

This should return这应该返回

['tree', 'TREE', 'water']

How it works:这个怎么运作：

The function remove_composite_words() tests if an element in contained in any other element in the list and save only those that match.函数remove_composite_words()测试一个元素是否包含在列表中的任何其他元素中，并只保存那些匹配的元素。 Then it remove the matching elements from the initial list.然后它从初始列表中删除匹配的元素。

So if you have element 'a' and list ['a','aa','b','c'] the function will return 'a' and the list ['b','c'] .因此，如果您有元素'a'和列表['a','aa','b','c']该函数将返回'a'和列表['b','c'] 。

keep_shortest_root() applies remove_composite_words() to the initial list and then to the transformed list (output from remove_composite_words() ) until there are no more words left. keep_shortest_root()将remove_composite_words() keep_shortest_root()应用于初始列表，然后应用于转换后的列表（来自remove_composite_words()输出），直到没有更多单词为止。

Note that keep_shortest_root() first gets the unique words from the input list and then sorts them by length.请注意， keep_shortest_root()首先从输入列表中获取唯一的单词，然后按长度对它们进行排序。 This combined with the fact that remove_composite_words() removed the matched words from initial list make the algorithm run faster since the number of comparisons drops with the number of iterations.这与remove_composite_words()从初始列表中删除匹配单词的事实相结合，使算法运行得更快，因为比较次数随着迭代次数而下降。

Answer 3

Found a bit of a simpler solution to the one already provided, thought I might chip in为已经提供的解决方案找到了一些更简单的解决方案，我想我可能会加入

 def Remove_Subset(List):
    ListCopy=List
    for Element1 in List:
        for Element2 in List:
            if (Element1 in Element2) and (Element1!= Element2):
                ListCopy.remove(Element2)
    return(ListCopy)
elements =["treeforest","tree","TREE","treeforest","water","waterfall","tree"]
print(Remove_Subset(elements))


>>> ['tree', 'TREE', 'water']

Answer 4

This is an explanation of the answer I gave in my comment这是我在评论中给出的答案的解释

I used this code:我使用了这个代码：

new_elements = list(filter(lambda item: not any(elem in item for elem in elements if elem != item), elements))

which yields:产生：

['tree', 'TREE', 'water']

I don't know how much you know about Python generator expressions, and filter , so I'll try to explain anyway.我不知道你对 Python 生成器表达式和filter了解多少，所以我还是尽量解释一下。

filter is a Python built-in function, which takes a function to use on each item in the supplied iterable (eg list, etc). filter是一个 Python 内置函数，它需要一个函数来在提供的可迭代对象（例如列表等）中的每个项目上使用。 In our case, the function is this:在我们的例子中，函数是这样的：

lambda item: not any(elem in item for elem in elements if elem != item)

This function takes an item from the the list ( item ), and then iterates over every element in the list ( for elem in elements ), and for each element ( elem ) checks if this element is in our string ( item ).此函数从列表 ( item ) 中获取一个项目，然后遍历列表中的每个元素 ( for elem in elements )，并为每个元素 ( elem ) 检查该元素是否在我们的字符串 ( item ) 中。 Note that it skips to the next element if elem != item , because we don't want to compare it with itself.请注意， if elem != item ，它会跳到下一个元素，因为我们不想将它与自身进行比较。

The function any simply keeps iterating until either the expression returned is True , or it reaches the end.函数any只是不断迭代，直到返回的表达式为True ，或者到达结尾。 If there were any matches, any returns True , but to tell filter to drop this item, we need to return False , so we invert the output from any .如果有任何匹配项， any返回True ，但要告诉filter删除此项，我们需要返回False ，因此我们反转any的输出。

We also pass to filter our list ( elements ), and convert the result from filter to another list .我们还通过filter我们的列表（ elements ），并将结果从filter转换为另一个list 。

Note: the bonus of using any instead of iterating over every item for every other item is that in the case of finding a match, we don't have to iterate over the entire list: any returns at that point.注意：使用any而不是迭代每个其他项目的每个项目的好处是，在找到匹配项的情况下，我们不必迭代整个列表：此时any返回。 In theory, this could be faster than two nested for-loops without a break statement.理论上，这可能比没有break语句的两个嵌套 for 循环更快。

Python：删除列表中至少由同一列表中的一个其他字符串包含的字符串

问题描述

4 个解决方案

解决方案1
2 已采纳 2020-08-26 11:22:54

解决方案2
1 2020-08-26 10:08:07

解决方案3
1 2020-08-26 10:58:35

解决方案4
1 2020-08-26 11:15:25

Python：删除列表中至少由同一列表中的一个其他字符串包含的字符串

问题描述

4 个解决方案

解决方案1 2 已采纳 2020-08-26 11:22:54

解决方案2 1 2020-08-26 10:08:07

解决方案3 1 2020-08-26 10:58:35

解决方案4 1 2020-08-26 11:15:25

解决方案1
2 已采纳 2020-08-26 11:22:54

解决方案2
1 2020-08-26 10:08:07

解决方案3
1 2020-08-26 10:58:35

解决方案4
1 2020-08-26 11:15:25