I have a list of strings and need to remove items contained in other items like shown:
a = ["one", "one single", "one single trick", "trick", "trick must", "trick must get", "one single trick must", "must get", "must get the job done"]
I just need to drop every string contained in another string in the same list, like: "one" is contained in "one single" so it must be dropped, then "one single" is contained in "one single trick" so also need to be dropped
I have tried:
b=a
for item in a:
for element in b:
if item in element:
b.remove(element)
expected result:
a = ["trick must get", "one single trick must", "must get the job done"]
Any help will be greatly appreciated! Thanks in advance!
A list comprehension should do this quite nicely, combined with Python's any function:
a = [phrase for phrase in a if not any([phrase2 != phrase and phrase in phrase2 for phrase2 in a])]
result:
>>> a = ["one", "one single", "one single trick", "trick", "trick must", "trick must get", "one single trick must", "must get", "must get the job done"]
>>> a = [phrase for phrase in a if not any([phrase2 != phrase and phrase in phrase2 for phrase2 in a])]
>>> a
['trick must get', 'one single trick must', 'must get the job done']
An efficient approach to solve the problem in O(n) time complexity is to use a set that keeps track of all the sub-phrases given a phrase, iterate from the longest string to the shortest, and only add the string to the output if the it is not already in the set of sub-phrases:
seen = set()
output = []
for s in sorted(a, key=len, reverse=True):
words = tuple(s.split())
if words not in seen:
output.append(s)
seen.update({words[i: i + n] for i in range(len(words)) for n in range(len(words) - i + 1)})
output
becomes:
['one single trick must', 'must get the job done', 'trick must get']
Not an efficient solution, but by sorting longest to smallest and removing the last element we can check if each appears as a sub string anywhere.
a = ['one', 'one single', 'one single trick', 'trick', 'trick must', 'trick must get',
'one single trick must', 'must get', 'must get the job done']
a = sorted(a, key=len, reverse=True)
b = []
for i in range(len(a)):
x = a.pop()
if x not in "\t".join(a):
b.append(x)
# ['trick must get', 'must get the job done', 'one single trick must']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.