简体   繁体   English

匹配字典集。 最优雅的解决方案。 Python

[英]Match set of dictionaries. Most elegant solution. Python

Given two lists of dictionaries, new one and old one.给定两个字典列表,新的和旧的。 Dictionaries represent the same objects in both lists.字典表示两个列表中的相同对象。
I need to find differences and produce new list of dictionaries where will be objects from new dictionaries only and updated attributes from old dictionaries.我需要找到差异并生成新的字典列表,其中仅来自新字典的对象和来自旧字典的更新属性。
Example:例子:

   list_new=[
             { 'id':1,
               'name':'bob',
               'desc': 'cool guy'
              },
             
             { 'id':2,
               'name':'Bill',
               'desc': 'bad guy'
              },

              { 'id':3,
               'name':'Vasya',
               'desc': None
              },
        ]

    list_old=[
             { 'id':1,
               'name':'boby',
               'desc': 'cool guy',
                'some_data' : '12345'
              },
             { 'id':2,
               'name':'Bill',
               'desc': 'cool guy',
               'some_data' : '12345'

              },
              { 'id':3,
               'name':'vasya',
               'desc': 'the man',
               'some_data' : '12345'
              },
              { 'id':4,
               'name':'Elvis',
               'desc': 'singer',
               'some_data' : '12345'
              },
            ]
            

In that example I want produce new list where will be only new guys from list_new with updated data.在那个例子中,我想生成新列表,其中只有来自 list_new 的新人更新数据。 Matched by id .通过id匹配。 So Bob will become Boby, Bill will become coll guy, Vasya become - the man.所以鲍勃会成为鲍比,比尔会成为科尔的家伙,瓦夏会成为——男人。 End Elvis have to be absent.最后猫王必须缺席。

Give me an elegant solution.给我一个优雅的解决方案。 With less amount of iteration loops.迭代循环较少。

There is way, I resolve that.有办法,我解决了。 Which is not the best:哪个不是最好的:

 def match_dict(new_list, old_list)
    ids_new=[]
    for item in new_list:
            ids_new.append(item['id'])
    result=[] 
    for item_old in old_medias:
        if item_old['id'] in ids_new:
            for item_new in new_list:
                if item_new['id']=item_old['id']
                    item_new['some_data']=item_old['some_data']
                    result.append(item_new)
    return result

The reason why I'm doubt, because there is loop inside loop.我之所以怀疑,是因为循环内部有循环。 If there will be lists of 2000 items the process would take same time.如果将有 2000 个项目的列表,则该过程将花费相同的时间。

Can't quite get it to one line, but here's a simpler version:不能把它写成一行,但这里有一个更简单的版本:

def match_new(new_list, old_list) :
    ids = dict((item['id'], item) for item in new_list)
    return [ids[item['id']] for item in old_list if item['id'] in ids]

Not knowing the constraints of your data, I will suppose that id is unique in each list, and that your list contains only imutable types (string, int,...) which are hashable.不知道您的数据的约束,我会假设id在每个列表中都是唯一的,并且您的列表只包含可散列的不可变类型(字符串、整数、...)。

# first index each list by id
new = {item['id']: item for item in list_new}
old = {item['id']: item for item in list_old}

# now you can see which ids appeared in the new list
created = set(new.keys())-set(old.keys())
# or which ids were deleted
deleted =  set(old.keys())-set(new.keys())
# or which ids exists in the 2 lists
intersect = set(new.keys()).intersection(set(old.keys()))

# using the same 'conversion to set' trick,
# you can see what is different for each item
diff = {id: dict(set(new[id].items())-set(old[id].items())) for id in intersect}

# using your example data set, diff now contains the differences for items which exists in the two lists:
# {1: {'name': 'bob'}, 2: {'desc': 'bad guy'}, 3: {'name': 'Vasya', 'desc': None}}

# you can now add the new ids to this diff
diff.update({id: new[id] for id in created})
# and get your data back into the original format:
list_diff = [dict(data, **{'id': id}) for id,data in diff.items()]

this is using python 3 syntax, but should be easily ported to python 2.这是使用 python 3 语法,但应该很容易移植到 python 2。

edit: here is the same code written for python 2.5:编辑:这是为 python 2.5 编写的相同代码:

new = dict((item['id'],item) for item in list_new)
old = dict((item['id'],item) for item in list_old)

created = set(new.keys())-set(old.keys())
deleted =  set(old.keys())-set(new.keys())
intersect = set(new.keys()).intersection(set(old.keys()))

diff = dict((id,dict(set(new[id].items())-set(old[id].items()))) for id in intersect)

diff.update(dict(id,new[id]) for id in created))
list_diff = [dict(data, **{'id': id}) for id,data in diff.items()]

(note how the code is less readable without the dict comprehension) (请注意,如果没有 dict 理解,代码的可读性会降低)

for each dictionary in old_list, search for the dictionary in new_list with the same id, then do: old_dict.update(new_dict)对于 old_list 中的每个字典,在 new_list 中搜索具有相同 id 的字典,然后执行: old_dict.update(new_dict)

eliminate each new_dict, after updating, from new_list and append the remaining, unused dicts after the loop.更新后,从 new_list 中删除每个 new_dict,并在循环后附加剩余的未使用的 dict。

Something like this is what you need:像这样的东西是你需要的:

l = []
for d in list_old:
    for e in list_new:
        if e['id'] == d['id']:
            l.append(dict(e, **d))
print l

Read here on how to merge dictionaries. 在此处阅读有关如何合并字典的信息。

You could do something like this:你可以这样做:

def match_dict(new_list, old_list):
    new_dict = dict((obj['id'], obj) for obj in new_list)
    old_dict = dict((obj['id'], obj) for obj in old_list)
    for k in new_dict.iterkeys():
        if k in old_dict:
            new_dict[k].update(old_dict[k])
        else:
            del new_dict[k]
    return new_dict.values()

If you are doing this often I would suggest storing your data as dictionaries with the id as the key instead of lists, that way you wouldn't have to convert it each time.如果您经常这样做,我建议您将数据存储为字典,以 id 作为键而不是列表,这样您就不必每次都转换它。

edit : Here is an example showing how to store the data in a dictionary.编辑:这是一个示例,显示如何将数据存储在字典中。

list_new = [{'desc': 'cool guy', 'id': 1, 'name': 'bob'}, {'desc': 'bad guy', 'id': 2, 'name': 'Bill'}, {'desc': None, 'id': 3, 'name': 'Vasya'}]
# create a dictionary with the value of 'id' as the key
dict_new = dict((obj['id'], obj) for obj in list_new)
# now you can access entries by their id instead of having to loop through the list
print dict_new[2]
# {'id': 2, 'name': 'Bill', 'desc': 'bad guy'}

Steps:脚步:

  • Create a look up dictionary for list_old by id通过 id 为 list_old 创建一个查找字典
  • Loop through list_new dicts creating a merged dict for each if it existed in old循环遍历 list_new dicts,如果它存在于旧字典中,则为每个字典创建一个合并的字典

Code:代码:

def match_dict(new_list, old_list): 
    old = dict((v['id'], v) for v in old_list)
    return [dict(d, **old[d['id']]) for d in new_list if d['id'] in old]

EDIT: incorrectly named variables inside function.编辑:函数内部的变量命名不正确。

You'd be much better off if your top-level data structure was a dict rather than a list.如果您的顶级数据结构是 dict 而不是列表,您的情况会好得多。 Then it would be:那么它将是:

dict_new.update(dict_old)

However, for what you actually have, try this:但是,对于您实际拥有的,请尝试以下操作:

result_list = []
for item in list_new:
    found_item = [d for d in list_old if d["id"] == item["id"]]
    if found_item:
        result_list.append(dict(item, **found_item[0]))

This actually still has a loop inside a loop (the inner loop is "hidden" in the list comprehension) so it's still O(n**2).这实际上在循环内部仍然有一个循环(内部循环在列表理解中“隐藏”)所以它仍然是 O(n**2)。 On large data sets it would undoubtedly be noticeably faster to convert it to a dict, update that, and then convert it back to a list.在大型数据集上,将其转换为字典,更新它,然后将其转换回列表无疑会明显更快。

You could like this one:你可能喜欢这个:

def match_dict(new_list, old_list):
    id_new = [item_new.get("id") for item_new in list_new]
    id_old = [item_old.get("id") for item_old in list_old]

    for idx_old in id_old:
        if idx_old in id_new:
            list_new[id_new.index(idx_old)].update(list_old[id_old.index(idx_old)])

    return list_new

from pprint import pprint
pprint(match_dict(list_new, list_old))

Output:输出:

[{'desc': 'cool guy', 'id': 1, 'name': 'boby', 'some_data': '12345'},
 {'desc': 'cool guy', 'id': 2, 'name': 'Bill', 'some_data': '12345'},
 {'desc': 'the man', 'id': 3, 'name': 'vasya', 'some_data': '12345'}]
[od for od in list_old if od['id'] in {nd['id'] for nd in list_new}]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM