繁体   English   中英

如何检查另一个字典列表中是否存在词典项的子集?

[英]How to check whether a subset of dictionary items exists in another list of dictionaries?

我有一个词典列表(称为“引物名称”),其中包含以下信息:

{'part number': 1, 'notes': 'Fw Gibson primer on pEM113 to extract CmR resistance and pSC101 backbone and T7 promoter and term.', 'direction': 'fw primer', 'construct': '24', 'source': 'pEM113'}
{'part number': 1, 'notes': 'Re Gibson primer on pEM113 to extract CmR resistance and pSC101 backbone and T7 promoter and term.', 'direction': 're primer', 'construct': '24', 'source': 'pEM113'}
{'part number': 2, 'notes': 'Fw Gibson primer on BBa_K274100 to extract crtEBI operon', 'direction': 'fw primer', 'construct': '24', 'source': 'BBa_K274100'}
{'part number': 2, 'notes': 'Re Gibson primer on BBa_K274100 to extract crtEBI operon', 'direction': 're primer', 'construct': '24', 'source': 'BBa_K274100'}
{'part number': 1, 'notes': 'Fw Gibson primer on pEM114 to extract CmR resistance and pSC101 backbone and K1F promoter and term.', 'direction': 'fw primer', 'construct': '25', 'source': 'pEM114'}
{'part number': 1, 'notes': 'Re Gibson primer on pEM114 to extract CmR resistance and pSC101 backbone and K1F promoter and term.', 'direction': 're primer', 'construct': '25', 'source': 'pEM114'}

我还有另一个词典列表(称为“引物序列”),其中包含以下信息:

{'Part Number': '1', 'Construct Number': '24', 'Direction': 're primer', 'Primer Sequence': 'agaccgtcatctagtacctcTCTCCCTATAGTGAGTCGTATTACTCTAGAAGCGGCCGCg'}
{'Part Number': '1', 'Construct Number': '24', 'Direction': 'fw primer', 'Primer Sequence': 'tggaggatctgatataataaTAGCATAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGG'}
{'Part Number': '2', 'Construct Number': '24', 'Direction': 'fw primer', 'Primer Sequence': 'TACGACTCACTATAGGGAGAgaggtactagatgacggtctgcgcaaaaaaacacgttcat'}
{'Part Number': '2', 'Construct Number': '24', 'Direction': 're primer', 'Primer Sequence': 'GGCCCCAAGGGGTTATGCTAttattatatcagatcctccagcatcaaacctgctgtcgct'}
{'Part Number': '1', 'Construct Number': '25', 'Direction': 're primer', 'Primer Sequence': 'agaccgtcatctagtacctcTCTCCCTATAGTGATAGTTATTACTCTAGAAGCGGCCGCg'}
{'Part Number': '1', 'Construct Number': '25', 'Direction': 'fw primer', 'Primer Sequence': 'tggaggatctgatataataaTAGCATAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGG'}
{'Part Number': '2', 'Construct Number': '25', 'Direction': 'fw primer', 'Primer Sequence': 'TAACTATCACTATAGGGAGAgaggtactagatgacggtctgcgcaaaaaaacacgttcat'}
{'Part Number': '2', 'Construct Number': '25', 'Direction': 're primer', 'Primer Sequence': 'GGCCCCAAGGGGTTATGCTAttattatatcagatcctccagcatcaaacctgctgtcgct'}

我的目标是结合两者中包含的信息,这样我得到的输出在底部列表中具有零件号,构建体编号,方向,引物序列,注释,构建体和每种引物(正反)的来源。 为了将“引物名称”与“引物序列”匹配,我必须检查以确保它们的“部件号”,“构造号”和“方向”都相同。

我已经尝试使用以下代码进行检查,但似乎无法正常工作:

for row in primers_names_list: #recall that primers_names_list is a list of dictionaries
    if any({x['Part Number'], x['Construct Number'], x['Direction']} == {row['part number'], row['construct number'], row['direction']} for x in primers_without_names):
        primers_with_names.append({'part number':row['part number'], 'construct number':row['construct number'], 'notes':row['notes'], 'primer sequence':x['Primer Sequence']})

有人可以提示我如何做到这一点吗?

非常感谢!

两个问题:

  1. part number引物名称中int ,而 物序列中str 为了产生True的比较,您必须将int转换为str (使用str(val) )或将str转换为int(使用int(val)

  2. 您在循环中使用的键名会KeyError异常,因为它们不正确(请注意, 引物序列具有Construct Number引物名称具有construct

这是一个工作代码示例:

primers_names_list = [
{'part number': 1, 'notes': 'Fw Gibson primer on pEM113 to extract CmR resistance and pSC101 backbone and T7 promoter and term.', 'direction': 'fw primer', 'construct': '24', 'source': 'pEM113'},
{'part number': 1, 'notes': 'Re Gibson primer on pEM113 to extract CmR resistance and pSC101 backbone and T7 promoter and term.', 'direction': 're primer', 'construct': '24', 'source': 'pEM113'},
{'part number': 2, 'notes': 'Fw Gibson primer on BBa_K274100 to extract crtEBI operon', 'direction': 'fw primer', 'construct': '24', 'source': 'BBa_K274100'},
{'part number': 2, 'notes': 'Re Gibson primer on BBa_K274100 to extract crtEBI operon', 'direction': 're primer', 'construct': '24', 'source': 'BBa_K274100'},
{'part number': 1, 'notes': 'Fw Gibson primer on pEM114 to extract CmR resistance and pSC101 backbone and K1F promoter and term.', 'direction': 'fw primer', 'construct': '25', 'source': 'pEM114'},
{'part number': 1, 'notes': 'Re Gibson primer on pEM114 to extract CmR resistance and pSC101 backbone and K1F promoter and term.', 'direction': 're primer', 'construct': '25', 'source': 'pEM114'},
]

primers_without_names = [
{'Part Number': '1', 'Construct Number': '24', 'Direction': 're primer', 'Primer Sequence': 'agaccgtcatctagtacctcTCTCCCTATAGTGAGTCGTATTACTCTAGAAGCGGCCGCg'},
{'Part Number': '1', 'Construct Number': '24', 'Direction': 'fw primer', 'Primer Sequence': 'tggaggatctgatataataaTAGCATAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGG'},
{'Part Number': '2', 'Construct Number': '24', 'Direction': 'fw primer', 'Primer Sequence': 'TACGACTCACTATAGGGAGAgaggtactagatgacggtctgcgcaaaaaaacacgttcat'},
{'Part Number': '2', 'Construct Number': '24', 'Direction': 're primer', 'Primer Sequence': 'GGCCCCAAGGGGTTATGCTAttattatatcagatcctccagcatcaaacctgctgtcgct'},
{'Part Number': '1', 'Construct Number': '25', 'Direction': 're primer', 'Primer Sequence': 'agaccgtcatctagtacctcTCTCCCTATAGTGATAGTTATTACTCTAGAAGCGGCCGCg'},
{'Part Number': '1', 'Construct Number': '25', 'Direction': 'fw primer', 'Primer Sequence': 'tggaggatctgatataataaTAGCATAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGG'},
{'Part Number': '2', 'Construct Number': '25', 'Direction': 'fw primer', 'Primer Sequence': 'TAACTATCACTATAGGGAGAgaggtactagatgacggtctgcgcaaaaaaacacgttcat'},
{'Part Number': '2', 'Construct Number': '25', 'Direction': 're primer', 'Primer Sequence': 'GGCCCCAAGGGGTTATGCTAttattatatcagatcctccagcatcaaacctgctgtcgct'},
]


primers_with_names = []
for row in primers_names_list: #recall that primers_names_list is a list of dictionaries
    for x in primers_without_names:
        if (
            int(x['Part Number']) == row['part number'] and
            x['Construct Number'] == row['construct'] and
            x['Direction'] == row['direction']
        ):
            primers_with_names.append(
                {
                    'part number': row['part number'], 
                    'construct number': row['construct'], 
                    'notes': row['notes'], 
                    'primer sequence':x['Primer Sequence']
                }
            )
            # If you are only expecting one match from the primers_without_names
            # collection, or wish to enforce that, you can add a break statement after
            # the insertion here to break out of the inner comparison loop and move on
            # to the next row item


for p in primers_with_names:
    print p

print
print len(primers_with_names)

编辑:如果比较值对于每个集合中的每一行都是唯一的,并且如果您有足够的内存并且不介意对列表进行预处理,则另一个选择是将两个集合转换为字典,并以(部件号,构造数字,方向)元组。 这样可以将查找工作减少到以后每行分摊O(1)。 总的来说,您将获得O(3N)而不是O(N ^ 2),这对于大型集合而言相当不错。

# convert both lists to dictionaries
primers_names_dict = { 
    (str(p['part number']), str(p['construct']), str(p['direction'])): p
    for p in primers_names_list 
}
primers_sequence_dict = {
    (str(p['Part Number']), str(p['Construct Number']), str(p['Direction'])): p
    for p in primers_without_names
}


# now that we have two dicts, we can do a key<->key match between them, so each
# comparison op is just a dictionary key lookup, which is O(1) on average
matches = []
for key in primers_names_dict.keys():
    if key in primers_sequence_dict: # amortized O(1) lookup
        matches.append( {
            'part number': primers_names_dict[key]['part number'], 
            'construct number': primers_names_dict[key]['construct'], 
            'notes': primers_names_dict[key]['notes'], 
            'primer sequence': primers_sequence_dict[key]['Primer Sequence']
        } )

for m in matches:
    print m
print len(matches)

我在这里看到两个问题。

  1. 一本字典中的部件号是整数,而另一本则是字符串。

  2. 您将row['construct number']放在我认为应该是row['construct']

此处已修复:

for row in primers_names_list: #recall that primers_names_list is a list of dictionaries
    for x in primers_without_names:
        if {x['Part Number'], x['Construct Number'], x['Direction']} == {str(row['part number']), row['construct'], row['direction']}:
            primers_with_names.append({'part number':row['part number'], 'construct number':row['construct'], 'notes':row['notes'], 'primer sequence':x['Primer Sequence']})

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM