简体   繁体   English

如何删除包含在同一字符串列表中的其他字符串中的字符串?

[英]How can I drop strings contained in other string contained in the same string list?

I have a list of strings and need to remove items contained in other items like shown: 我有一个字符串列表,需要删除其他项目中包含的项目,如下所示:

a = ["one", "one single", "one single trick", "trick", "trick must", "trick must get", "one single trick must", "must get", "must get the job done"]

I just need to drop every string contained in another string in the same list, like: "one" is contained in "one single" so it must be dropped, then "one single" is contained in "one single trick" so also need to be dropped 我只需要在同一个列表中删除包含在另一个字符串中的每个字符串,例如:“one”包含在“one one”中,因此必须将其删除,然后“one single”包含在“one single trick”中,因此还需要被丢弃

I have tried: 我试过了:

b=a
for item in a:
    for element in b:
        if item in element:
            b.remove(element)

expected result: 预期结果:

a = ["trick must get", "one single trick must", "must get the job done"]

Any help will be greatly appreciated! 任何帮助将不胜感激! Thanks in advance! 提前致谢!

A list comprehension should do this quite nicely, combined with Python's any function: 列表理解应该很好地完成,并结合Python的任何函数:

a = [phrase for phrase in a if not any([phrase2 != phrase and phrase in phrase2 for phrase2 in a])]

result: 结果:

>>> a = ["one", "one single", "one single trick", "trick", "trick must", "trick must get", "one single trick must", "must get", "must get the job done"]
>>> a = [phrase for phrase in a if not any([phrase2 != phrase and phrase in phrase2 for phrase2 in a])]
>>> a
['trick must get', 'one single trick must', 'must get the job done']

An efficient approach to solve the problem in O(n) time complexity is to use a set that keeps track of all the sub-phrases given a phrase, iterate from the longest string to the shortest, and only add the string to the output if the it is not already in the set of sub-phrases: 解决O(n)时间复杂度问题的有效方法是使用一个集合来跟踪给定短语的所有子短语,从最长的字符串迭代到最短的字符串,并且仅在输出中添加字符串。它还不在子短语集中:

seen = set()
output = []
for s in sorted(a, key=len, reverse=True):
    words = tuple(s.split())
    if words not in seen:
        output.append(s)
    seen.update({words[i: i + n] for i in range(len(words)) for n in range(len(words) - i + 1)})

output becomes: output变为:

['one single trick must', 'must get the job done', 'trick must get']

Not an efficient solution, but by sorting longest to smallest and removing the last element we can check if each appears as a sub string anywhere. 不是一个有效的解决方案,但通过排序最长到最小并删除最后一个元素,我们可以检查每个元素是否在任何地方显示为子字符串。

a = ['one', 'one single', 'one single trick', 'trick', 'trick must', 'trick must get', 
     'one single trick must', 'must get', 'must get the job done']
a = sorted(a, key=len, reverse=True)
b = []
for i in range(len(a)):
    x = a.pop()
    if x not in "\t".join(a):
        b.append(x)

# ['trick must get', 'must get the job done', 'one single trick must']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python:删除列表中至少由同一列表中的一个其他字符串包含的字符串 - Python: Remove Strings in a List that are contained by at least one other String in the same List Python:如果字符串 1 包含在字符串 2 中,如何检查两个具有重复字母的字符串? - Python: How can I check between two strings that have repeating letters if string 1 is contained within string 2? 如何将 1 个变量中包含的多个字符串组合成 1 个字符串 - How To Combine Multiple Strings Contained in 1 variable into 1 String 如何提取嵌套列表中包含的字符串? - How to extract a string contained in nested list? 如何匹配包含在四个列表中的子字符串? - How match contained sub string in four list? 如何测试字符串是否部分包含在 Python 的列表中 - How to test if a string is partialy contained in a list in Python 查找列表中是否包含字符串列表中的任何字符串元素 - Find if any string element in list is contained in list of strings 如何找到文本文件中字符串中包含的列表中数字的平均值? - How can I find the average of the numbers in a list contained in a string from a text file? 检查列表元素中是否包含字符串 - Check if string is contained in list element 如何检查字符串中的整个单词是否包含在其他字符串中? - How to check if whole words from string are contained in other string?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM