[英]Merge two dictionaries based on similarity excluding a key
I have the following three dictionaries in an array: 我在数组中有以下三个字典:
items = [
{
'FirstName': 'David',
'LastName': 'Smith',
'Language': set(['en'])
},
{
'FirstName': 'David',
'LastName': 'Smith',
'Language': set(['fr'])
},
{
'FirstName': 'Bob',
'LastName': 'Jones',
'Language': set(['en'])
} ]
I want to merge together these dictionaries if the two dictionaries are the same minus the specified key: and add that key together. 如果两个字典相同(减去指定的键),我想将这些字典合并在一起,然后将该键加在一起。 If using the "Language"
key it would merge the array into the following: 如果使用"Language"
键,它将把数组合并为以下内容:
[ {
'FirstName': 'David',
'LastName': 'Smith',
'Language': set(['en','fr'])
},{
'FirstName': 'Bob',
'LastName': 'Jones',
'Language': set(['en'])
} ]
Here is what I'm currently doing: 这是我目前正在做的事情:
from copy import deepcopy
def _merge_items_on_field(items, field):
'''Given an array of dicts, merge the
dicts together if they are the same except for the 'field'.
If merging dicts, add the unique values of that field together.'''
items = deepcopy(items)
items_merged_on_field = []
for num, item in enumerate(items):
# Remove that key/value from the dict
field_value = item.pop(field)
# Get an array of items *without* that field to compare against
items_without_field = deepcopy(items_merged_on_field)
map(lambda d: d.pop(field), items_without_field)
# If the dict item is found ("else"), add the fields together
# If not ("except"), then add in the dict item to the array
try:
index = items_without_field.index(item)
except ValueError:
item[field] = field_value
items_merged_on_field.append(item)
else:
items_merged_on_field[index][field] = items_merged_on_field[index][field].union(field_value)
return items_merged_on_field
>>> items = [{'LastName': 'Smith', 'Language': set(['en']), 'FirstName': 'David'}, {'LastName': 'Smith', 'Language': set(['fr']), 'FirstName': 'David'}, {'LastName': 'Jones', 'Language': set(['en']), 'FirstName': 'Bob'}]
>>> _merge_items_on_field(items, 'Language')
[{'LastName': 'Smith', 'Language': set(['fr', 'en']), 'FirstName': 'David'}, {'LastName': 'Jones', 'Language': set(['en']), 'FirstName': 'Bob'}]
This seems a bit complicated -- is there a better way to do this? 这似乎有点复杂-有更好的方法吗?
There are a couple of ways of doing this. 有两种方法可以做到这一点。 The most painless method to my knowledge utilises the pandas library—in particular, a groupby
+ apply
. 据我所知,最轻松的方法是利用pandas库-特别是groupby
+ apply
。
import pandas as pd
merged = (
pd.DataFrame(items)
.groupby(['FirstName', 'LastName'], sort=False)
.Language
.apply(lambda x: set.union(*x))
.reset_index()
.to_dict(orient='records')
)
print(merged)
[
{'FirstName': 'David', 'LastName': 'Smith', 'Language': {'en', 'fr'}},
{'FirstName': 'Bob', 'LastName': 'Jones', 'Language': {'en'}}
]
The other method (that I mentioned) uses itertools.groupby
, but seeing as you have 30 columns to group on, I'd just recommend sticking to pandas. 另一种方法(我提到过)使用itertools.groupby
,但是看到要分组的有30列,我只建议坚持使用熊猫。
If you want to turn this into a function, 如果您想将其转换为功能,
def merge(items, field):
df = pd.DataFrame(items)
columns = df.columns.difference([field]).tolist()
return (
df.groupby(columns, sort=False)[field]
.apply(lambda x: set.union(*x))
.reset_index()
.to_dict(orient='records')
)
merged = merge(items, 'Language')
print(merged)
[
{'FirstName': 'David', 'LastName': 'Smith', 'Language': {'en', 'fr'}},
{'FirstName': 'Bob', 'LastName': 'Jones', 'Language': {'en'}}
]
You can use itertools.groupby
: 您可以使用itertools.groupby
:
import itertools
d = [{'FirstName': 'David', 'LastName': 'Smith', 'Language': {'en'}}, {'FirstName': 'David', 'LastName': 'Smith', 'Language': {'fr'}}, {'FirstName': 'Bob', 'LastName': 'Jones', 'Language': {'en'}}]
v = [[a, list(b)] for a, b in itertools.groupby(sorted(d, key=lambda x:x['FirstName']), key=lambda x:x['FirstName'])]
final_dict = [{**{'FirstName':a}, **{'LastName':(lambda x:[list(set(x)), x[0]][len(set(x)) == 1])([i['LastName'] for i in b])}, **{'Language':set([list(i['Language'])[0] for i in b])}} for a, b in v]
Output: 输出:
[{'FirstName': 'Bob', 'LastName': 'Jones', 'Language': {'en'}}, {'FirstName': 'David', 'LastName': 'Smith', 'Language': {'en', 'fr'}}]
If pandas is not an option: 如果不能选择熊猫:
from itertools import groupby
from functools import reduce
arr = [
{'FirstName': 'David', 'LastName': 'Smith', 'Language': set(['en'])},
{'FirstName': 'David', 'LastName': 'Smith', 'Language': set(['fr'])},
{'FirstName': 'David', 'LastName': 'Jones', 'Language': set(['sp'])}
]
def reduce_field(items, field, op=set.union, sort=False):
def _key(d):
return tuple((k, v) for k, v in d.items() if k != field)
if sort:
items = sorted(items, key=_key)
res = []
for k, g in groupby(items, key=_key):
d = dict(k)
d[field] = reduce(op, (el[field] for el in g))
res.append(d)
return res
reduce_field(arr, 'Language')
You can try it manually : 您可以手动尝试:
new_dict={}
#
#
#
d = [{'FirstName': 'David', 'LastName': 'Smith', 'Language': {'en'}},
{'FirstName': 'David', 'LastName': 'Smith', 'Language': {'fr'}},
{'FirstName': 'Bob', 'LastName': 'Jones', 'Language': {'en'}}]
for i in d:
if (i['FirstName'],i['LastName']) not in new_dict:
new_dict[(i['FirstName'],i['LastName'])]=i
else:
new_dict[(i['FirstName'],i['LastName'])]['Language']=set(list(new_dict[(i['FirstName'],i['LastName'])]['Language'])+list(i['Language']))
print(new_dict.values())
output: 输出:
# dict_values([{'FirstName': 'Bob',
# 'LastName': 'Jones',
# 'Language': {'en'}},
# {'FirstName': 'David',
# 'LastName': 'Smith',
# 'Language': {'fr', 'en'}}])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.