简体   繁体   中英

Python - List of unique sequences

I have a dictionary with elements as lists of certain sequence:

a = {'seq1':['5', '4', '3', '2', '1', '6', '7', '8', '9'], 
     'seq2':['9', '8', '7', '6', '5', '4', '3', '2', '1'],
     'seq3':['5', '4', '3', '2', '1', '11', '12', '13', '14'],
     'seq4':['15', '16', '17'],
     'seq5':['18', '19', '20', '21', '22', '23'],
     'seq6':['18', '19', '20', '24', '25', '26']}

So there are 6 sequences

What I need to do is:

  • To find only unique lists (if two lists contains the same elements (regardless of their order), they are not unique) - say I need to get rid of the second list (the first founded unique list will stay)
  • In unique lists I need to find unique subsequences of elements and print it

Bounds of unique sequences are found by resemblance of elements order - in the 1st and the 3rd lists the bound ends exactly after element '1', so we get the subsequence ['5','4','3','2','1']

As the result I would like to see elements exactly in the same order as it was in the beginning (if it`s possible at all somehow). So I expect this:

[['5', '4', '3', '2', '1']['6', '7', '8', '9']['11', '12', '13', '14']['15', '16', '17']['18', '19', '20']['21', '22', '23']['24', '25', '26']]

Tried to do it this way:

import itertools

unique_sets = []

a = {'seq1':["5","4","3","2","1","6","7","8","9"], 'seq2':["9","8","7","6","5","4","3","2","1"], 'seq3':["5","4","3","2","1","11","12","13","14"], 'seq4':["15","16","17"], 'seq5':["18","19","20","21","22","23"], 'seq6':["18","19","20","24","25","26"]}

b = []

for seq in a.values():
    b.append(seq)

for seq1, seq2 in itertools.combinations(b,2):                                     #searching for intersections 
    if set(seq1).intersection(set(seq2)) not in unique_sets:
        #if set(seq1).intersection(set(seq2)) == set(seq1):
            #continue
        unique_sets.append(set(seq1).intersection(set(seq2)))
    if set(seq1).difference(set(seq2)) not in unique_sets:
        unique_sets.append(set(seq1).difference(set(seq2)))

for it in unique_sets:
    print(it)

I got this which is a little bit different from my expectations:

{'9', '5', '2', '3', '7', '1', '4', '8', '6'}
set()
{'5', '2', '3', '1', '4'}
{'9', '8', '6', '7'}
{'5', '2', '14', '3', '1', '11', '12', '4', '13'}
{'17', '16', '15'}
{'19', '20', '18'}
{'23', '21', '22'}

Without comment in the code above the result is even worse.

Plus I have the problem with unordered elements in the sets, which I get as the result. Tried to do this with two separate lists:

seq1 = set([1,2,3,4,5,6,7,8,9])
seq2 = set([1,2,3,4,5,10,11,12])

and it worked fine - elements didn`t ever change their position in sets. Where is my mistake?

Thanks.

Updated: Ok, now I have a little bit more complicated task, where offered alghorithm won`t work

I have this dictionary:

precond = {

'seq1':     ["1","2"],
'seq2':     ["3","4","2"],
'seq3':     ["5","4","2"],
'seq4':     ["6","7","4","2"],
'seq5':     ["6","4","7","2"],
'seq6':     ["6","1","8","9","10"],
'seq7':     ["6","1","8","11","9","12","13","14"],
'seq8':     ["6","1","8","11","4","15","13"],
'seq9':     ["6","1","8","16","9","11","4","17","18","2"],
'seq10':    ["6","1","8","19","9","4","16","2"],
}

I expect these sequences, containing at least 2 elements:

[1, 2], 
[4, 2], 
[6, 7], 
[6, 4, 7, 2], 
[6, 1, 8] 
[9,10], 
[6,1,8,11]
[9,12,13,14]
[4,15,13]
[16,9,11,4,17,18,2]
[19,9,4,16,2]

Right now I wrote this code:

precond = {

    'seq1':     ["1","2"],
    'seq2':     ["3","4","2"],
    'seq3':     ["5","4","2"],
    'seq4':     ["6","7","4","2"],
    'seq5':     ["6","4","7","2"],
    'seq6':     ["6","1","8","9","10"],
    'seq7':     ["6","1","8","11","9","12","13","14"],
    'seq8':     ["6","1","8","11","4","15","13"],
    'seq9':     ["6","1","8","16","9","11","4","17","18","2"],
    'seq10':    ["6","1","8","19","9","4","16","2"],
}

seq_list = []
result_seq = []
#d = []

for seq in precond.values():
    seq_list.append(seq)

#print(b)

contseq_ind = 0
control_seq = seq_list[contseq_ind]
mainseq_ind = 1
el_ind = 0
#index2 = 0

def compar():
    if control_seq[contseq_ind] != seq_list[mainseq_ind][el_ind]:
        mainseq_ind += 1
        compar()
    else:
        result_seq.append(control_seq[contseq_ind])
        contseq_ind += 1
        el_ind += 1

        if contseq_ind > len(control_seq):
            control_seq = seq_list[contseq_ind + 1]
            compar()
        else:
            compar()


compar()

This code is not complete anyway - I created looking for the same elements from the beginning, so I still need to write a code for searching of sequence in the end of two compared elements.

Right now I have a problem with recursion. Immidiately after first recursed call I have this error:

if control_seq[contseq_ind] != b[mainseq_ind][el_ind]:
UnboundLocalError: local variable 'control_seq' referenced before assignment

How can I fix this? Or maybe you have a better idea, than using recursion? Thank you in advance.

Not sure if this is what you wanted, but it gets the same result:

from collections import OrderedDict

a = {'seq1':["5","4","3","2","1","6","7","8","9"],
     'seq2':["9","8","7","6","5","4","3","2","1"],
     'seq3':["5","4","3","2","1","11","12","13","14"],
     'seq4':["15","16","17"],
     'seq5':["18","19","20","21","22","23"],
     'seq6':["18","19","20","24","25","26"]}

level = 0
counts = OrderedDict()
# go through each value in the list of values to count the number
# of times it is used and indicate which list it belongs to
for elements in a.values():
    for element in elements:
        if element in counts:
            a,b = counts[element]
            counts[element] = a,b+1
        else:
            counts[element] = (level,1)
    level+=1

last = 0
result = []
# now break up the dictionary of unique values into lists according 
# to the count of each value and the level that they existed in 
for k,v in counts.items():
    if v == last:
        result[-1].append(k)
    else:
        result.append([k])
    last = v

print(result)

Result:

[['5', '4', '3', '2', '1'], 
 ['6', '7', '8', '9'], 
 ['11', '12', '13', '14'], 
 ['15', '16', '17'], 
 ['18', '19', '20'], 
 ['21', '22', '23'], 
 ['24', '25', '26']]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM