[英]Python: Remove Strings in a List that are contained by at least one other String in the same List
[英]How can I drop strings contained in other string contained in the same string list?
我有一个字符串列表,需要删除其他项目中包含的项目,如下所示:
a = ["one", "one single", "one single trick", "trick", "trick must", "trick must get", "one single trick must", "must get", "must get the job done"]
我只需要在同一个列表中删除包含在另一个字符串中的每个字符串,例如:“one”包含在“one one”中,因此必须将其删除,然后“one single”包含在“one single trick”中,因此还需要被丢弃
我试过了:
b=a
for item in a:
for element in b:
if item in element:
b.remove(element)
预期结果:
a = ["trick must get", "one single trick must", "must get the job done"]
任何帮助将不胜感激! 提前致谢!
列表理解应该很好地完成,并结合Python的任何函数:
a = [phrase for phrase in a if not any([phrase2 != phrase and phrase in phrase2 for phrase2 in a])]
结果:
>>> a = ["one", "one single", "one single trick", "trick", "trick must", "trick must get", "one single trick must", "must get", "must get the job done"]
>>> a = [phrase for phrase in a if not any([phrase2 != phrase and phrase in phrase2 for phrase2 in a])]
>>> a
['trick must get', 'one single trick must', 'must get the job done']
解决O(n)时间复杂度问题的有效方法是使用一个集合来跟踪给定短语的所有子短语,从最长的字符串迭代到最短的字符串,并且仅在输出中添加字符串。它还不在子短语集中:
seen = set()
output = []
for s in sorted(a, key=len, reverse=True):
words = tuple(s.split())
if words not in seen:
output.append(s)
seen.update({words[i: i + n] for i in range(len(words)) for n in range(len(words) - i + 1)})
output
变为:
['one single trick must', 'must get the job done', 'trick must get']
不是一个有效的解决方案,但通过排序最长到最小并删除最后一个元素,我们可以检查每个元素是否在任何地方显示为子字符串。
a = ['one', 'one single', 'one single trick', 'trick', 'trick must', 'trick must get',
'one single trick must', 'must get', 'must get the job done']
a = sorted(a, key=len, reverse=True)
b = []
for i in range(len(a)):
x = a.pop()
if x not in "\t".join(a):
b.append(x)
# ['trick must get', 'must get the job done', 'one single trick must']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.