简体   繁体   中英

How to obtain a list of strings such that it represents all the strings in a given list?

I have a list of strings. From this list i want to generate a new list of strings such that all the strings are unique (i know i can use the set to do this), but this new list should be such that no string in the new list can be subset of any other string in the list.

EDIT From the comments that i have received, i will try to put up some clarifications. The word "subset" is not accurate, it should be substring .

I think this should work:

def unique_sublist(lst):
    sorted_lst = sorted(lst, key=len, reverse=True)
    subs = set()
    result = []
    for s in sorted_lst:
        if s not in subs:
            subs.update(s[i:j] for i in range(0, len(s))
                        for j in range(i+1, len(s)+1))
            result.append(s)
    return sorted(result, key=lst.index)

>>> unique_sublist(['a', 'man', 'ran', 'at', 'a', 'catamaran', 'boat'])
['man', 'catamaran', 'boat']

>>> unique_sublist(['abcd', 'abyet', 'abcd betry', 'outry', 'rumunu abyetin', 'takama eli', 'com betry', 'rumunu', 'foutrym'])
['abcd betry', 'rumunu abyetin', 'takama eli', 'com betry', 'foutrym']

My edit fixes a few issues with the previous code. Note that this now prefers longer words over shorter ones.

This simple code:

def funky(alist):
    result = []
    for s in sorted(alist, key=len, reverse=True):
        if not any(s in item for item in result):
            result.append(s)
    return result # no ordering requirement was specified

print funky(['a', 'man', 'ran', 'at', 'a', 'catamaran', 'boat'])
print funky(['abcd', 'abyet', 'abcd betry', 'outry', 'rumunu abyetin', 'takama eli', 'com betry', 'rumunu', 'foutrym'])

produces:

['catamaran', 'boat', 'man']
['rumunu abyetin', 'abcd betry', 'takama eli', 'com betry', 'foutrym']

I think it does it:

li = [ 'abcd',
       'abyet',
       'abcd betry',
       'outry',
       'rumunu abyetin',
       'takama eli',
       'com betry',
       'rumunu',
       'foutrym']


la = []
for x in li:
    if not any(x in el or el in x for el in la):
        la.append(x)

print li
print
print la

result

['abcd', 'abyet', 'abcd betry', 'outry', 'rumunu abyetin', 'takama eli', 'com betry', 'rumunu', 'foutrym']

['abcd', 'abyet', 'outry', 'takama eli', 'com betry', 'rumunu']

PS

But if 'abyet' and 'rumunu abyetin' are interchanged in the former list, the deducted list wil have 'rumunu abyetin' and not 'abyet'.
Why is 'abyet' authorized to be in the deducted list in the first case and not in the second ? : because of its place in the first list.

You must precise additional criteria to accept or refuse a string in the resulting list, because for the moment it seems to me that there are several combinations that can respond to your question from a given list

.

PS

This answer of mine clearly doesn't deserve an upvote.
The upvoter is kindly asked to remove his upvote

alist = ['a', 'man', 'ran', 'at', 'a', 'catamaran', 'boat']
result = set()
for word1 in alist:
    if word1 in result:
        continue # shortcut for performance reasons only
    to_remove = []
    for word2 in result:
        if word1 in word2:
            break
        if word2 in word1:
            to_remove.append(word2)
    else:
        result.add(word1)
    for word in to_remove:
        result.remove(word)
print result

gives

set(['catamaran', 'boat', 'man'])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM