简体   繁体   English

将词典列表与另一本词典结合

[英]combining a list of dictionaries with another dictionary

I have a list with a set amount of dictionaries inside which I have to compare to one other dictionary. 我有一个列表,其中包含一定数量的词典,在其中必须与其他词典进行比较。

They have the following form (there is no specific form or pattern for keys and values, these are randomly chosen examples): 它们具有以下形式(键和值没有特定的形式或模式,这些是随机选择的示例):

list1 = [
    {'X1': 'Q587', 'X2': 'Q67G7', ...},
    {'AB1': 'P5K7', 'CB2': 'P678', ...},
    {'B1': 'P6H78', 'C2': 'BAA5', ...}]

dict1 = {
    'X1': set([B00001,B00020,B00010]),
    'AB1': set([B00001,B00007,B00003]), 
    'C2': set([B00001,B00002,B00003]),  ...
}

What I want to have now is a new dictionary which has as keys: the values of the dictionaries in list1. 我现在想要的是一个新的字典,它具有作为键:list1中字典的值。 and as values the values of dict1. 并将dict1的值作为值。 And this only when the keys intersect in compared dictionaries. 并且仅当键在比较字典中相交时。

I have done this in the following way: 我已通过以下方式完成此操作:

nDicts = len(list1)
resultDict = {}

    for key in range(0,nDicts):
            for x in list1[key].keys():
                if x in dict1.keys():
                    resultDict.update{list1[key][x]:dict1[x]}
                    print resultDict

The desired output should be of the form: 所需的输出应采用以下形式:

resulDict = {
        'Q587': set([B00001,B00020,B00010]),
        'P5K7': set([B00001,B00007,B00003]), 
        'BAA5': set([B00001,B00002,B00003]),  ...
    }

This works but since the amount of data is so high this takes forever. 这是可行的,但是由于数据量如此之高,这将永远花费。 Is there a better way to do this? 有一个更好的方法吗?

EDIT: I have changed the input values a little, the only ones that matter are the keys which intersect between the dictionaries within list1 and those within dict1. 编辑:我稍微改变了输入值,唯一重要的是在list1字典和dict1字典之间相交的键。

The keys method in Python 2.x makes a list with a copy of all of the keys, and you're doing this not only for each dict in list1 (probably not a big deal, but it's hard to know for sure without knowing your data), but also doing it for dict1 over and over again. Python 2.x中的keys方法会创建一个包含所有键的副本的列表,而且您不仅要对list1每个字典进行此操作(可能不大,但是如果不了解您的代码就很难确定)数据),而且还要为dict1反复进行。

On top of that, doing an in test on a list takes a long time, because it has to check each value in the list until it finds a match, but doing an in test on a dictionary is nearly instant, because it just has to look up the hash value. 最重要的是,对列表进行in测试需要很长时间,因为它必须检查列表中的每个值直到找到匹配项,而对字典进行in测试几乎是即时的,因为它只需要查找哈希值。

Both keys are actually completely unnecessary—iterating a dict gives you the keys in order (an unspecified order, but the same is true for calling keys() ), and in -checking a dict searches the same keys you'd get with keys() . 这两个keys实际上是完全没有必要的,迭代的字典为您提供了顺序按键(未指定的顺序,但同样适用于呼叫keys()in -检查的字典查找你会得到具有相同的键keys() So, just removing them does the same thing, but simpler, faster, and with less memory used. 因此,仅删除它们就可以完成相同的事情,但是更简单,更快并且使用的内存更少。 So: 所以:

for key in range(0,nDicts):
    for x in list1[key]:
        if x in dict1:
            resultDict={list1[key][x]:dict1[x]}
            print resultDict

There are also ways you can simplify this that probably won't help performance that much, but are still worth doing. 还有一些方法可以简化此过程,可能对性能没有太大帮助,但仍然值得做。

You can iterate directly over list1 instead of building a huge list of all the indices and iterating that. 您可以直接在list1进行迭代,而不是构建所有索引的庞大列表并对其进行迭代。

for list1_dict in list1:
    for x in list1_dict:
        if x in dict1:
            resultDict = {list_dict[x]: dict1[x]}
            print resultDict

And you can get the keys and values in a single step: 您只需一步即可获取键和值:

for list1_dict in list1:
    for k, v in list1_dict.iteritems():
        if k in dict1:
            resultDict = {v: dict1[k]}
            print resultDict

Also, if you expect most of the values to be found, it will take about twice as long to first check for the value and then look it up as it would to just try to look it up and handle failure. 另外,如果您希望找到大多数值,则首先检查该值然后查找它所花费的时间大约是尝试查找并处理故障的两倍。 (This is not true if most of the values will not be found, however.) So: (但是,如果找不到大多数值, 则不正确。)因此:

for list1_dict in list1:
    for k, v in list1_dict.iteritems():
        try:
            resultDict = {v: dict1[k]}
            print resultDict
        except KeyError:
            pass

You can simplify and optimize your operation with set intersections; 您可以通过设置相交来简化和优化操作。 as of Python 2.7 dictionaries can represent keys as sets using the dict.viewkeys() method, or dict.keys() in Python 3: 从Python 2.7开始,字典可以使用dict.viewkeys()方法或Python 3中的dict.keys()将键表示为集合:

resultDict = {}

for d in list1:
    for sharedkey in d.viewkeys() & dict1:
        resultDict[d[sharedkey]] = dict1[sharedkey]

This can be turned into a dict comprehension even: 甚至可以将其转换为dict理解:

resultDict = {d[sharedkey]: dict1[sharedkey] 
              for d in list1 for sharedkey in d.viewkeys() & dict1}

I am assuming here you wanted one resulting dictionary, not a new dictionary per shared key. 我在这里假设您想要一个结果字典,而不是每个共享密钥一个新的字典。

Demo on your sample input: 示例输入示例:

>>> list1 = [
...     {'X1': 'AAA1', 'X2': 'BAA5'},
...     {'AB1': 'AAA1', 'CB2': 'BAA5'},
...     {'B1': 'AAA1', 'C2': 'BAA5'},
... ]
>>> dict1 = {
...     'X1': set(['B00001', 'B00002', 'B00003']),
...     'AB1': set(['B00001', 'B00002', 'B00003']),
... }
>>> {d[sharedkey]: dict1[sharedkey] 
...  for d in list1 for sharedkey in d.viewkeys() & dict1}
{'AAA1': set(['B00001', 'B00002', 'B00003'])}

Note that both X1 and AB1 are shared with dictionaries in list1 , but in both cases, the resulting key is AAA1 . 请注意, X1 AB1都与list1词典共享,但是在两种情况下,结果键都是AAA1 Only one of these wins (the last match), but since both values in dict1 are exactly the same anyway that doesn't make any odds in this case. 这些胜利中只有一个(最后一场比赛),但是由于dict1中的两个值都是完全相同的,因此在这种情况下不会有任何赔率。

If you wanted separate dictionaries per dictionary in list1 , simply move the for d in list1: loop out: 如果要在list1for d in list1:每个词典单独词典,只需将for d in list1:移到for d in list1:循环出:

for d in list1:
    resultDict = {d[sharedkey]: dict1[sharedkey] for sharedkey in d.viewkeys() & dict1}
    if resultDict:  # can be empty
        print resultDict

If you really wanted one dictionary per shared key, move another loop out: 如果您确实想要每个共享密钥一个字典,请移出另一个循环:

for d in list1:
    for sharedkey in d.viewkeys() & dict1:
        resultDict = {d[sharedkey]: dict1[sharedkey]}
        print resultDict
#!/usr/bin/env python

list1 = [

    {'X1': 'AAA1', 'X2': 'BAA5'},
    {'AB1': 'AAA1', 'CB2': 'BAA5'},
    {'B1': 'AAA1', 'C2': 'BAA5'}

    ]


dict1 = {
    'X1': set(['B00001','B00002','B00003']),
    'AB1': set(['B00001','B00002','B00003'])
}    


g = ( k.iteritems() for k in list1)
ite = ((a,b) for i in g for a,b in i if dict1.has_key(a))

d = dict(ite)            
print d          

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM