[英]How to Check whether a element exists in a python dictionary value which is a list?
[英]How to check whether a subset of dictionary items exists in another list of dictionaries?
我有一个词典列表(称为“引物名称”),其中包含以下信息:
{'part number': 1, 'notes': 'Fw Gibson primer on pEM113 to extract CmR resistance and pSC101 backbone and T7 promoter and term.', 'direction': 'fw primer', 'construct': '24', 'source': 'pEM113'}
{'part number': 1, 'notes': 'Re Gibson primer on pEM113 to extract CmR resistance and pSC101 backbone and T7 promoter and term.', 'direction': 're primer', 'construct': '24', 'source': 'pEM113'}
{'part number': 2, 'notes': 'Fw Gibson primer on BBa_K274100 to extract crtEBI operon', 'direction': 'fw primer', 'construct': '24', 'source': 'BBa_K274100'}
{'part number': 2, 'notes': 'Re Gibson primer on BBa_K274100 to extract crtEBI operon', 'direction': 're primer', 'construct': '24', 'source': 'BBa_K274100'}
{'part number': 1, 'notes': 'Fw Gibson primer on pEM114 to extract CmR resistance and pSC101 backbone and K1F promoter and term.', 'direction': 'fw primer', 'construct': '25', 'source': 'pEM114'}
{'part number': 1, 'notes': 'Re Gibson primer on pEM114 to extract CmR resistance and pSC101 backbone and K1F promoter and term.', 'direction': 're primer', 'construct': '25', 'source': 'pEM114'}
我还有另一个词典列表(称为“引物序列”),其中包含以下信息:
{'Part Number': '1', 'Construct Number': '24', 'Direction': 're primer', 'Primer Sequence': 'agaccgtcatctagtacctcTCTCCCTATAGTGAGTCGTATTACTCTAGAAGCGGCCGCg'}
{'Part Number': '1', 'Construct Number': '24', 'Direction': 'fw primer', 'Primer Sequence': 'tggaggatctgatataataaTAGCATAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGG'}
{'Part Number': '2', 'Construct Number': '24', 'Direction': 'fw primer', 'Primer Sequence': 'TACGACTCACTATAGGGAGAgaggtactagatgacggtctgcgcaaaaaaacacgttcat'}
{'Part Number': '2', 'Construct Number': '24', 'Direction': 're primer', 'Primer Sequence': 'GGCCCCAAGGGGTTATGCTAttattatatcagatcctccagcatcaaacctgctgtcgct'}
{'Part Number': '1', 'Construct Number': '25', 'Direction': 're primer', 'Primer Sequence': 'agaccgtcatctagtacctcTCTCCCTATAGTGATAGTTATTACTCTAGAAGCGGCCGCg'}
{'Part Number': '1', 'Construct Number': '25', 'Direction': 'fw primer', 'Primer Sequence': 'tggaggatctgatataataaTAGCATAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGG'}
{'Part Number': '2', 'Construct Number': '25', 'Direction': 'fw primer', 'Primer Sequence': 'TAACTATCACTATAGGGAGAgaggtactagatgacggtctgcgcaaaaaaacacgttcat'}
{'Part Number': '2', 'Construct Number': '25', 'Direction': 're primer', 'Primer Sequence': 'GGCCCCAAGGGGTTATGCTAttattatatcagatcctccagcatcaaacctgctgtcgct'}
我的目标是结合两者中包含的信息,这样我得到的输出在底部列表中具有零件号,构建体编号,方向,引物序列,注释,构建体和每种引物(正反)的来源。 为了将“引物名称”与“引物序列”匹配,我必须检查以确保它们的“部件号”,“构造号”和“方向”都相同。
我已经尝试使用以下代码进行检查,但似乎无法正常工作:
for row in primers_names_list: #recall that primers_names_list is a list of dictionaries
if any({x['Part Number'], x['Construct Number'], x['Direction']} == {row['part number'], row['construct number'], row['direction']} for x in primers_without_names):
primers_with_names.append({'part number':row['part number'], 'construct number':row['construct number'], 'notes':row['notes'], 'primer sequence':x['Primer Sequence']})
有人可以提示我如何做到这一点吗?
非常感谢!
两个问题:
part number
是引物名称中的int
,而引 物序列中的str
。 为了产生True
的比较,您必须将int
转换为str
(使用str(val) )或将str
转换为int(使用int(val) )
您在循环中使用的键名会KeyError
异常,因为它们不正确(请注意, 引物序列具有Construct Number
而引物名称具有construct
)
这是一个工作代码示例:
primers_names_list = [
{'part number': 1, 'notes': 'Fw Gibson primer on pEM113 to extract CmR resistance and pSC101 backbone and T7 promoter and term.', 'direction': 'fw primer', 'construct': '24', 'source': 'pEM113'},
{'part number': 1, 'notes': 'Re Gibson primer on pEM113 to extract CmR resistance and pSC101 backbone and T7 promoter and term.', 'direction': 're primer', 'construct': '24', 'source': 'pEM113'},
{'part number': 2, 'notes': 'Fw Gibson primer on BBa_K274100 to extract crtEBI operon', 'direction': 'fw primer', 'construct': '24', 'source': 'BBa_K274100'},
{'part number': 2, 'notes': 'Re Gibson primer on BBa_K274100 to extract crtEBI operon', 'direction': 're primer', 'construct': '24', 'source': 'BBa_K274100'},
{'part number': 1, 'notes': 'Fw Gibson primer on pEM114 to extract CmR resistance and pSC101 backbone and K1F promoter and term.', 'direction': 'fw primer', 'construct': '25', 'source': 'pEM114'},
{'part number': 1, 'notes': 'Re Gibson primer on pEM114 to extract CmR resistance and pSC101 backbone and K1F promoter and term.', 'direction': 're primer', 'construct': '25', 'source': 'pEM114'},
]
primers_without_names = [
{'Part Number': '1', 'Construct Number': '24', 'Direction': 're primer', 'Primer Sequence': 'agaccgtcatctagtacctcTCTCCCTATAGTGAGTCGTATTACTCTAGAAGCGGCCGCg'},
{'Part Number': '1', 'Construct Number': '24', 'Direction': 'fw primer', 'Primer Sequence': 'tggaggatctgatataataaTAGCATAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGG'},
{'Part Number': '2', 'Construct Number': '24', 'Direction': 'fw primer', 'Primer Sequence': 'TACGACTCACTATAGGGAGAgaggtactagatgacggtctgcgcaaaaaaacacgttcat'},
{'Part Number': '2', 'Construct Number': '24', 'Direction': 're primer', 'Primer Sequence': 'GGCCCCAAGGGGTTATGCTAttattatatcagatcctccagcatcaaacctgctgtcgct'},
{'Part Number': '1', 'Construct Number': '25', 'Direction': 're primer', 'Primer Sequence': 'agaccgtcatctagtacctcTCTCCCTATAGTGATAGTTATTACTCTAGAAGCGGCCGCg'},
{'Part Number': '1', 'Construct Number': '25', 'Direction': 'fw primer', 'Primer Sequence': 'tggaggatctgatataataaTAGCATAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGG'},
{'Part Number': '2', 'Construct Number': '25', 'Direction': 'fw primer', 'Primer Sequence': 'TAACTATCACTATAGGGAGAgaggtactagatgacggtctgcgcaaaaaaacacgttcat'},
{'Part Number': '2', 'Construct Number': '25', 'Direction': 're primer', 'Primer Sequence': 'GGCCCCAAGGGGTTATGCTAttattatatcagatcctccagcatcaaacctgctgtcgct'},
]
primers_with_names = []
for row in primers_names_list: #recall that primers_names_list is a list of dictionaries
for x in primers_without_names:
if (
int(x['Part Number']) == row['part number'] and
x['Construct Number'] == row['construct'] and
x['Direction'] == row['direction']
):
primers_with_names.append(
{
'part number': row['part number'],
'construct number': row['construct'],
'notes': row['notes'],
'primer sequence':x['Primer Sequence']
}
)
# If you are only expecting one match from the primers_without_names
# collection, or wish to enforce that, you can add a break statement after
# the insertion here to break out of the inner comparison loop and move on
# to the next row item
for p in primers_with_names:
print p
print
print len(primers_with_names)
编辑:如果比较值对于每个集合中的每一行都是唯一的,并且如果您有足够的内存并且不介意对列表进行预处理,则另一个选择是将两个集合转换为字典,并以(部件号,构造数字,方向)元组。 这样可以将查找工作减少到以后每行分摊O(1)。 总的来说,您将获得O(3N)而不是O(N ^ 2),这对于大型集合而言相当不错。
# convert both lists to dictionaries
primers_names_dict = {
(str(p['part number']), str(p['construct']), str(p['direction'])): p
for p in primers_names_list
}
primers_sequence_dict = {
(str(p['Part Number']), str(p['Construct Number']), str(p['Direction'])): p
for p in primers_without_names
}
# now that we have two dicts, we can do a key<->key match between them, so each
# comparison op is just a dictionary key lookup, which is O(1) on average
matches = []
for key in primers_names_dict.keys():
if key in primers_sequence_dict: # amortized O(1) lookup
matches.append( {
'part number': primers_names_dict[key]['part number'],
'construct number': primers_names_dict[key]['construct'],
'notes': primers_names_dict[key]['notes'],
'primer sequence': primers_sequence_dict[key]['Primer Sequence']
} )
for m in matches:
print m
print len(matches)
我在这里看到两个问题。
一本字典中的部件号是整数,而另一本则是字符串。
您将row['construct number']
放在我认为应该是row['construct']
此处已修复:
for row in primers_names_list: #recall that primers_names_list is a list of dictionaries
for x in primers_without_names:
if {x['Part Number'], x['Construct Number'], x['Direction']} == {str(row['part number']), row['construct'], row['direction']}:
primers_with_names.append({'part number':row['part number'], 'construct number':row['construct'], 'notes':row['notes'], 'primer sequence':x['Primer Sequence']})
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.