简体   繁体   English

根据通用值合并2个字典列表

[英]Merging 2 list of dicts based on common values

So I have 2 list of dicts which are as follows: 因此,我有2个字典,它们如下:

list1 = [
{'name':'john',
'gender':'male',
'grade': 'third'
},
{'name':'cathy',
'gender':'female',
'grade':'second'
},
]

list2 = [
{'name':'john',
'physics':95,
'chemistry':89
},
{'name':'cathy',
'physics':78,
'chemistry':69
},
]

The output list i need is as follows: 我需要的输出列表如下:

final_list = [
{'name':'john',
'gender':'male',
'grade':'third'
'marks': {'physics':95, 'chemistry': 89}
},
{'name':'cathy',
'gender':'female'
'grade':'second'
'marks': {'physics':78, 'chemistry': 69}
},
]

First i tried with iteration as follows: 首先,我尝试进行迭代,如下所示:

final_list = []
for item1 in list1:
    for item2 in list2:
        if item1['name'] == item2['name']:
            temp = dict(item_2)
            temp.pop('name')
            final_result.append(dict(name=item_1['name'], **temp))

However,this does not give me the desired result..I also tried pandas..limited experience there.. 但是,这并没有给我想要的结果。.我还在那里尝试了熊猫。

>>> import pandas as pd
>>> df1 = pd.DataFrame(list1)
>>> df2 = pd.DataFrame(list2)
>>> result = pd.merge(df1, df2, on=['name'])

However,i am clueless how to get the data back to the original format i need it in..Any help 但是,我不知道如何将数据恢复为我需要的原始格式。

You can first merge both dataframes 您可以先合并两个数据框

In [144]: df = pd.DataFrame(list1).merge(pd.DataFrame(list2))

Which would look like, 看起来像

In [145]: df
Out[145]:
   gender   grade   name  chemistry  physics
0    male   third   john         89       95
1  female  second  cathy         69       78

Then create a marks columns as a dict 然后创建一个标记列作为字典

In [146]: df['marks'] = df.apply(lambda x: [x[['chemistry', 'physics']].to_dict()], axis=1)

In [147]: df
Out[147]:
   gender   grade   name  chemistry  physics  \
0    male   third   john         89       95
1  female  second  cathy         69       78

                                  marks
0  [{u'chemistry': 89, u'physics': 95}]
1  [{u'chemistry': 69, u'physics': 78}]

And, use to_dict(orient='records') method of selected columns of dataframe 并且,使用数据to_dict(orient='records')所选列的to_dict(orient='records')方法

In [148]: df[['name', 'gender', 'grade', 'marks']].to_dict(orient='records')
Out[148]:
[{'gender': 'male',
  'grade': 'third',
  'marks': [{'chemistry': 89L, 'physics': 95L}],
  'name': 'john'},
 {'gender': 'female',
  'grade': 'second',
  'marks': [{'chemistry': 69L, 'physics': 78L}],
  'name': 'cathy'}]

Using your pandas approach, you can call 使用大熊猫方法,您可以致电

result.to_dict(orient='records')

to get it back as a list of dictionaries. 将其作为字典列表返回。 It won't put marks in as a sub-field though, since there's nothing telling it to do that. 但是,它不会将marks放在子字段中,因为没有任何内容告诉它这样做。 physics and chemistry will just be fields on the same level as the rest. physicschemistry将与其他领域处于同一水平。

You may also be having problems because your name is 'cathy ' in the first list and 'kathy' in the second, which naturally won't get merged. 您可能还会遇到问题,因为您的name在第一个列表中是'cathy ”,在第二个列表中是'kathy' ,这自然不会合并。

Considering you want a list of dicts as output, you can easily do what you want without pandas, use a dict to store all the info using the names as the outer keys, doing one pass over each list not like the O(n^2) double loops in your own code: 考虑到您希望将字典列表作为输出,您可以轻松地做您想做的事而无需熊猫,使用字典将名称作为外键来存储所有信息,对每个列表进行一次遍历,就像O(n^2)您自己的代码中的双循环:

out = {d["name"]: d for d in list1}
for d in list2:
    out[d.pop("name")]["marks"] = d


from pprint import pprint as pp

pp(list(out.values()))

Output: 输出:

[{'gender': 'female',
  'grade': 'second',
  'marks': {'chemistry': 69, 'physics': 78},
  'name': 'cathy'},
 {'gender': 'male',
  'grade': 'third',
  'marks': {'chemistry': 89, 'physics': 95},
  'name': 'john'}]

That reuses the dicts in your lists, if you wanted to create new dicts: 如果您要创建新的字典,则可以重复使用列表中的字典:

out = {d["name"]: d.copy() for d in list1}

for d in list2:
    k = d.pop("name")
    out[k]["marks"] = d.copy()

from pprint import pprint as pp

pp(list(out.values()))

The output is the same: 输出是相同的:

[{'gender': 'female',
  'grade': 'second',
  'marks': {'chemistry': 69, 'physics': 78},
  'name': 'cathy'},
 {'gender': 'male',
  'grade': 'third',
  'marks': {'chemistry': 89, 'physics': 95},
  'name': 'john'}]

create a function that will add a marks column , this columns should contain a dictionary of physics and chemistry marks 创建一个将添加marks列的函数,该列应包含physicschemistry标记字典

def create_marks(df):
    df['marks'] = { 'chemistry' : df['chemistry'] , 'physics' : df['physics'] }
    return df

result_with_marks = result.apply( create_marks , axis = 1)

Out[19]:
gender  grade   name    chemistry   physics            marks
male    third   john    89             95   {u'chemistry': 89, u'physics': 95}
female  second  cathy   69             78   {u'chemistry': 69, u'physics': 78}

then convert it to your desired result as follows 然后将其转换为所需的结果,如下所示

result_with_marks.drop( ['chemistry' , 'physics'], axis = 1).to_dict(orient = 'records')

Out[20]:
[{'gender': 'male',
  'grade': 'third',
  'marks': {'chemistry': 89L, 'physics': 95L},
  'name': 'john'},
 {'gender': 'female',
  'grade': 'second',
  'marks': {'chemistry': 69L, 'physics': 78L},
  'name': 'cathy'}]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM