嵌套字典

Question

I am working on some FASTA-like sequences (not FASTA, but something I have defined that's similar for some culled PDB from the PISCES server). 我正在研究一些类似于FASTA的序列（不是FASTA，但是我定义的东西与从PISCES服务器中剔除的PDB类似）。

I have a question. 我有个问题。 I have a small no of sequences called nCatSeq , for which there are MULTIPLE nBasinSeq . 我没有一个叫做nCatSeq的序列，其中有多个nBasinSeq 。 I go through a large PDB file and I want to extract for each nCatSeq the corresponding nBasinSeq without redundancies in a dictionary. 我浏览了一个大的PDB文件，我想为每个nCatSeq提取对应的nBasinSeq而在字典中没有冗余。 The code snippet that does this is given below. 下面给出了执行此操作的代码段。

nCatSeq=item[1][n]+item[1][n+1]+item[1][n+2]+item[1][n+3]
nBasinSeq=item[2][n]+item[2][n+1]+item[2][n+2]+item[2][n+3]
if nCatSeq not in potBasin:
    potBasin[nCatSeq]=nBasinSeq
else:   
    if nBasinSeq not in potBasin[nCatSeq]:
        potBasin[nCatSeq]=potBasin[nCatSeq],nBasinSeq
    else:
        pass

I get the following as the answer for one nCatSeq, 我得到以下作为一个nCatSeq的答案，

'4241': ((('VUVV', 'DDRV'), 'DDVG'), 'VUVV')

what I want however is : 但是我想要的是：

'4241': ('VUVV', 'DDRV', 'DDVG', 'VUVV') '4241'：（'VUVV'，'DDRV'，'DDVG'，'VUVV'）

I don't want all the extra brackets due to the following command 由于以下命令，我不需要所有多余的括号

potBasin[nCatSeq]=potBasin[nCatSeq],nBasinSeq

(see above code snippet) （请参见上面的代码段）

Is there a way to do this ? 有没有办法做到这一点？

Answer 1

The problem is putting a comma to "append" an element just creates a new tuple every time. 问题是用逗号“附加”一个元素，每次都会创建一个新的元组。 To solve this you use lists and append : 为了解决这个问题，您可以使用列表并append ：

nCatSeq=item[1][n]+item[1][n+1]+item[1][n+2]+item[1][n+3]
nBasinSeq=item[2][n]+item[2][n+1]+item[2][n+2]+item[2][n+3]
if nCatSeq not in potBasin:
    potBasin[nCatSeq]=[nBasinSeq]
elif nBasinSeq not in potBasin[nCatSeq]:
        potBasin[nCatSeq].append(nBasinSeq)

Even better would be to instead of making potBasin a normal dictionary, replace it with a defaultdict . 更好的是，而不是使potBasin成为普通字典，而将其替换为defaultdict 。 The code can then be simplified to: 然后可以将代码简化为：

# init stuff
from collections import defaultdict
potBasin = defaultdict(list)

# inside loop
nCatSeq=item[1][n]+item[1][n+1]+item[1][n+2]+item[1][n+3]
nBasinSeq=item[2][n]+item[2][n+1]+item[2][n+2]+item[2][n+3]
potBasin[nCatSeq].append(nBasinSeq)

Answer 2

You can add them as tuples: 您可以将它们添加为元组：

if nCatSeq not in potBasin:
    potBasin[nCatSeq] = (nBasinSeq,)
else:
    if nBasinSeq not in potBasin[nCatSeq]:
        potBasin[nCatSeq] = potBasin[nCatSeq] + (nBasinSeq,)

That way, rather than: 这样，而不是：

(('VUVV', 'DDRV'), 'DDVG')
# you will get
('VUVV', 'DDRV', 'DDVG') # == ('VUVV', 'DDRV')+ ('DDVG',)

Answer 3

Your question boils down to flattening a nested list and eliminating redundant entries: 您的问题归结为拼合嵌套列表并消除冗余条目：

def flatten(nested, answer=None):
    if answer is None:
        answer = []
    if nested == []:
        return answer
    else:
        n = nested[0]
        if is instance(n, tuple):
            return flatten(nested[1:], nested(n[0], answer))
        else:
            return flatten(nested[1:], answer+n[0])

So, with your nested dictionary: 因此，使用您的嵌套字典：

for k in nested_dict:
    nested_dict[k] = tuple(flatten(nested_dict[k]))

if you want to eliminate duplicate entries: 如果要消除重复的条目：

for k in nested_dict:
    nested_dict[k] = tuple(set(flatten(nested_dict[k])))

Hope this helps 希望这可以帮助

嵌套字典

问题描述

3 个解决方案

解决方案1
5 2012-10-08 16:07:06

解决方案2
1 已采纳 2012-10-08 16:08:59

解决方案3
0 2012-10-08 16:09:53

嵌套字典

问题描述

3 个解决方案

解决方案1 5 2012-10-08 16:07:06

解决方案2 1 已采纳 2012-10-08 16:08:59

解决方案3 0 2012-10-08 16:09:53

解决方案1
5 2012-10-08 16:07:06

解决方案2
1 已采纳 2012-10-08 16:08:59

解决方案3
0 2012-10-08 16:09:53