简体   繁体   中英

Nested dictionary

I am working on some FASTA-like sequences (not FASTA, but something I have defined that's similar for some culled PDB from the PISCES server).

I have a question. I have a small no of sequences called nCatSeq , for which there are MULTIPLE nBasinSeq . I go through a large PDB file and I want to extract for each nCatSeq the corresponding nBasinSeq without redundancies in a dictionary. The code snippet that does this is given below.

nCatSeq=item[1][n]+item[1][n+1]+item[1][n+2]+item[1][n+3]
nBasinSeq=item[2][n]+item[2][n+1]+item[2][n+2]+item[2][n+3]
if nCatSeq not in potBasin:
    potBasin[nCatSeq]=nBasinSeq
else:   
    if nBasinSeq not in potBasin[nCatSeq]:
        potBasin[nCatSeq]=potBasin[nCatSeq],nBasinSeq
    else:
        pass

I get the following as the answer for one nCatSeq,

'4241': ((('VUVV', 'DDRV'), 'DDVG'), 'VUVV')

what I want however is :

'4241': ('VUVV', 'DDRV', 'DDVG', 'VUVV')

I don't want all the extra brackets due to the following command

potBasin[nCatSeq]=potBasin[nCatSeq],nBasinSeq 

(see above code snippet)

Is there a way to do this ?

The problem is putting a comma to "append" an element just creates a new tuple every time. To solve this you use lists and append :

nCatSeq=item[1][n]+item[1][n+1]+item[1][n+2]+item[1][n+3]
nBasinSeq=item[2][n]+item[2][n+1]+item[2][n+2]+item[2][n+3]
if nCatSeq not in potBasin:
    potBasin[nCatSeq]=[nBasinSeq]
elif nBasinSeq not in potBasin[nCatSeq]:
        potBasin[nCatSeq].append(nBasinSeq)

Even better would be to instead of making potBasin a normal dictionary, replace it with a defaultdict . The code can then be simplified to:

# init stuff
from collections import defaultdict
potBasin = defaultdict(list)

# inside loop
nCatSeq=item[1][n]+item[1][n+1]+item[1][n+2]+item[1][n+3]
nBasinSeq=item[2][n]+item[2][n+1]+item[2][n+2]+item[2][n+3]
potBasin[nCatSeq].append(nBasinSeq)

You can add them as tuples:

if nCatSeq not in potBasin:
    potBasin[nCatSeq] = (nBasinSeq,)
else:
    if nBasinSeq not in potBasin[nCatSeq]:
        potBasin[nCatSeq] = potBasin[nCatSeq] + (nBasinSeq,)

That way, rather than:

(('VUVV', 'DDRV'), 'DDVG')
# you will get
('VUVV', 'DDRV', 'DDVG') # == ('VUVV', 'DDRV')+ ('DDVG',)

Your question boils down to flattening a nested list and eliminating redundant entries:

def flatten(nested, answer=None):
    if answer is None:
        answer = []
    if nested == []:
        return answer
    else:
        n = nested[0]
        if is instance(n, tuple):
            return flatten(nested[1:], nested(n[0], answer))
        else:
            return flatten(nested[1:], answer+n[0])

So, with your nested dictionary:

for k in nested_dict:
    nested_dict[k] = tuple(flatten(nested_dict[k]))

if you want to eliminate duplicate entries:

for k in nested_dict:
    nested_dict[k] = tuple(set(flatten(nested_dict[k])))

Hope this helps

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM