简体   繁体   English

在 Python 中使用单个键合并两个不同长度的字典列表

[英]Merge two lists of dicts of different lengths using a single key in Python

I want to merge two lists of dictionaries on a single key, when the two lists are different lengths (using Python 3.6).当两个列表的长度不同(使用 Python 3.6)时,我想在一个键上合并两个字典列表。 For example, if we have a list of dicts called l1 :例如,如果我们有一个名为l1的字典列表:

l1 = [{'pcd_sector': 'ABDC', 'coverage_2014': '100'},
       {'pcd_sector': 'DEFG', 'coverage_2014': '0'}]

and another list of dicts called l2 :和另一个名为l2的字典列表:

l2 = [{'pcd_sector': 'ABDC', 'asset': '3G', 'asset_id': '2gs'},
      {'pcd_sector': 'ABDC', 'asset': '4G', 'asset_id': '7jd'},
      {'pcd_sector': 'DEFG', 'asset': '3G', 'asset_id': '3je'},
      {'pcd_sector': 'DEFG', 'asset': '4G', 'asset_id': '8js'},
      {'pcd_sector': 'CDEF', 'asset': '3G', 'asset_id': '4jd'}]

How would one merge them using pcd_sector to get this(?):如何使用pcd_sector合并它们来获得这个(?):

result = [{'pcd_sector': 'ABDC', 'asset': '3G', 'asset_id': '2gs', 'coverage_2014': '100'},
          {'pcd_sector': 'ABDC', 'asset': '4G', 'asset_id': '7jd', 'coverage_2014': '100'},
          {'pcd_sector': 'DEFG', 'asset': '3G', 'asset_id': '3je', 'coverage_2014': '0'},
          {'pcd_sector': 'DEFG', 'asset': '4G', 'asset_id': '8js', 'coverage_2014': '0'},
          {'pcd_sector': 'CDEF', 'asset': '3G', 'asset_id': '4jd'}]

What I have tried so far到目前为止我尝试过的

I've used the following code to merge the two lists, but I end up with a short version unfortunately, not the desired complete data structure.我使用以下代码合并了两个列表,但不幸的是我最终得到了一个简短版本,而不是所需的完整数据结构。

import pprint
grouped = {}
for d in l1 + l2:
    grouped.setdefault(d['pcd_sector'], {'asset':0, 'asset_id':0, 'coverage_2014':0}).update(d)
result = [d for d in grouped.values()]
pprint.pprint(result)

So when I run the code, I end up with this short output:所以当我运行代码时,我最终得到了这个简短的输出:

result = [{'pcd_sector': 'ABDC', 'asset': '3G', 'asset_id': '2gs', 'coverage_2014': '100'},
         {'pcd_sector': 'DEFG', 'asset': '4G', 'asset_id': '8js', 'coverage_2014': '0'},
         {'pcd_sector': 'CDEF', 'asset': '3G', 'asset_id': '4jd'}]

Problem问题

The problem in your approach is that your data is put in a grouped dict with 'pcd_sector' as keys but your l2 has multiple dicts with the same 'pcd_sector' .您的方法中的问题是您的数据被放入一个以'pcd_sector'作为键的grouped字典中,但您的l2有多个具有相同'pcd_sector'字典。 You could use a tuple of 'pcd_sector', 'asset' as key for l2 , but it wouldn't work for l1 anymore.您可以使用'pcd_sector', 'asset'元组作为l2键,但它不再适用于l1了。 So you need to do the processing in two steps instead of iterating on l1 + l2 directly.所以需要分两步进行处理,而不是直接对l1 + l2进行迭代。

Theory理论

If pcd_sector keys are unique in l1 , you can create a big dict instead of a list of small dicts:如果pcd_sector键在l1是唯一的,您可以创建一个大字典而不是小字典列表:

>>> d1 = {d['pcd_sector']:d for d in l1}
>>> d1
{'ABDC': {'pcd_sector': 'ABDC', 'coverage_2014': '100'}, 'DEFG': {'pcd_sector': 'DEFG', 'coverage_2014': '0'}}

Then, you simply need to merge the dicts that have the same pcd_sector keys:然后,您只需要合并具有相同pcd_sector键的字典:

>>> [dict(d, **d1.get(d['pcd_sector'], {})) for d in l2]
[{'asset_id': '2gs', 'coverage_2014': '100', 'pcd_sector': 'ABDC', 'asset': '3G'}, {'asset_id': '7jd', 'coverage_2014': '100', 'pcd_sector': 'ABDC', 'asset': '4G'}, {'asset_id': '3je', 'coverage_2014': '0', 'pcd_sector': 'DEFG', 'asset': '3G'}, {'asset_id': '8js', 'coverage_2014': '0', 'pcd_sector': 'DEFG', 'asset': '4G'}, {'asset_id': '4jd', 'pcd_sector': 'CDEF', 'asset': '3G'}]

Complete code完整代码

Putting it all together, the code becomes:把它们放在一起,代码变成:

l1 = [{'pcd_sector': 'ABDC', 'coverage_2014': '100'},
       {'pcd_sector': 'DEFG', 'coverage_2014': '0'}]

l2 = [{'pcd_sector': 'ABDC', 'asset': '3G', 'asset_id': '2gs'},
      {'pcd_sector': 'ABDC', 'asset': '4G', 'asset_id': '7jd'},
      {'pcd_sector': 'DEFG', 'asset': '3G', 'asset_id': '3je'},
      {'pcd_sector': 'DEFG', 'asset': '4G', 'asset_id': '8js'},
      {'pcd_sector': 'CDEF', 'asset': '3G', 'asset_id': '4jd'}]

d1 = {d['pcd_sector']:d for d in l1}
result = [dict(d, **d1.get(d['pcd_sector'], {})) for d in l2]

import pprint
pprint.pprint(result)
#   [{'asset': '3G',
#     'asset_id': '2gs',
#     'coverage_2014': '100',
#     'pcd_sector': 'ABDC'},
#    {'asset': '4G',
#     'asset_id': '7jd',
#     'coverage_2014': '100',
#     'pcd_sector': 'ABDC'},
#    {'asset': '3G',
#     'asset_id': '3je',
#     'coverage_2014': '0',
#     'pcd_sector': 'DEFG'},
#    {'asset': '4G',
#     'asset_id': '8js',
#     'coverage_2014': '0',
#     'pcd_sector': 'DEFG'},
#    {'asset': '3G', 'asset_id': '4jd', 'pcd_sector': 'CDEF'}]

You can create a lookup dictionary based on pcd_sector and just update your original list of dicts based on that:您可以创建一个基于pcd_sector的查找字典,并根据它更新您的原始字典列表:

>>> import copy
>>> lookup = { x['pcd_sector'] : x for x in l1 }
>>> result = copy.deepcopy(l2)
>>> for d in result:
...     d.update(lookup.get(d['pcd_sector'], {})) # golfed courtesy Ashwini Chaudhary
... 
>>> result
[{'pcd_sector': 'ABDC', 'asset': '3G', 'asset_id': '2gs', 'coverage_2014': '100'}, 
{'pcd_sector': 'ABDC', 'asset': '4G', 'asset_id': '7jd', 'coverage_2014': '100'}, 
{'pcd_sector': 'DEFG', 'asset': '3G', 'asset_id': '3je', 'coverage_2014': '0'}, 
{'pcd_sector': 'DEFG', 'asset': '4G', 'asset_id': '8js', 'coverage_2014': '0'},
{'pcd_sector': 'CDEF', 'asset': '3G', 'asset_id': '4jd'}]

A solution using pandas :使用pandas的解决方案:

import pandas as pd

df1 = pd.DataFrame(l1)
df2 = pd.DataFrame(l2)
dfr = df1.join(df2, how='outer')
print(dfr)

Output:输出:

  coverage_2014 pcd_sector asset asset_id
0           100       ABDC    3G      2gs
1           100       ABDC    4G      7jd
2             0       DEFG    3G      3je
3             0       DEFG    4G      8js
4           NaN       CDEF    3G      4jd

If you want it as a dictionary again:如果你又想把它当作字典:

result = dfr.to_dict('records')
print(result)

Output (with linebreaks added):输出(添加换行符):

[{'coverage_2014': '100', 'pcd_sector': 'ABDC', 'asset': '3G', 'asset_id': '2gs'},
 {'coverage_2014': '100', 'pcd_sector': 'ABDC', 'asset': '4G', 'asset_id': '7jd'},
 {'coverage_2014': '0', 'pcd_sector': 'DEFG', 'asset': '3G', 'asset_id': '3je'},
 {'coverage_2014': '0', 'pcd_sector': 'DEFG', 'asset': '4G', 'asset_id': '8js'},
 {'coverage_2014': nan, 'pcd_sector': 'CDEF', 'asset': '3G', 'asset_id': '4jd'}]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM