[英]Remove duplicates from list of dictionaries created using groupby itertools in Python
I want to remove some duplicates in my merged dictionary.我想删除合并字典中的一些重复项。
My data:我的数据:
mongo_data = [{
'url': 'https://goodreads.com/',
'variables': [{'key': 'Harry Potter', 'value': '10.0'},
{'key': 'Discovery of Witches', 'value': '8.5'},],
'vendor': 'Fantasy'
},{
'url': 'https://goodreads.com/',
'variables': [{'key': 'Hunger Games', 'value': '10.0'},
{'key': 'Maze Runner', 'value': '5.5'},],
'vendor': 'Dystopia'
},{
'url': 'https://kindle.com/',
'variables': [{'key': 'Divergent', 'value': '9.0'},
{'key': 'Lord of the Rings', 'value': '9.0'},],
'vendor': 'Fantasy'
},{
'url': 'https://kindle.com/',
'variables': [{'key': 'The Handmaids Tale', 'value': '10.0'},
{'key': 'Divergent', 'value': '9.0'},],
'vendor': 'Fantasy'
}]
My code:我的代码:
for key, group in groupby(mongo_data, key=lambda chunk: chunk['url']):
search = {"url": key, "results": []}
for vendor, group2 in groupby(group, key=lambda chunk2: chunk2['vendor']):
result = {
"genre": vendor,
"data": [{'key': key['key'], 'value': key['value']}
for result2 in group2
for key in result2["variables"]],
}
search["results"].append(result)
searches.append(search)
My result:我的结果:
[
{
"url": "https://goodreads.com/",
"results": [
{
"genre": "Fantasy",
"data": [
{
"key": "Harry Potter",
"value": "10.0"
},
{
"key": "Discovery of Witches",
"value": "8.5"
}
]
},
{
"genre": "Dystopia",
"data": [
{
"key": "Hunger Games",
"value": "10.0"
},
{
"key": "Maze Runner",
"value": "5.5"
}
]
}
]
},
{
"url": "https://kindle.com/",
"results": [
{
"genre": "Fantasy",
"data": [
{
"key": "Divergent",
"value": "9.0"
},
{
"key": "Lord of the Rings",
"value": "9.0"
},
{
"key": "The Handmaids Tale",
"value": "10.0"
},
{
"key": "Divergent",
"value": "9.0"
}
]
}
}
]
}
]
I do not want any duplicates in my structure.我不希望我的结构中有任何重复项。 I am not sure on how to take them out.
我不确定如何将它们取出。 My expected result can be seen below.
我的预期结果如下所示。
Expected result:预期结果:
[
{
"url": "https://goodreads.com/",
"results": [
{
"genre": "Fantasy",
"data": [
{
"key": "Harry Potter",
"value": "10.0"
},
{
"key": "Discovery of Witches",
"value": "8.5"
}
]
},
{
"genre": "Dystopia",
"data": [
{
"key": "Hunger Games",
"value": "10.0"
},
{
"key": "Maze Runner",
"value": "5.5"
}
]
}
]
},
{
"url": "https://kindle.com/",
"results": [
{
"genre": "Fantasy",
"data": [
{
"key": "Divergent",
"value": "9.0"
},
{
"key": "Lord of the Rings",
"value": "9.0"
},
{
"key": "The Handmaids Tale",
"value": "10.0"
}
]
}
}
]
}
]
Divergent is getting repeated in the last list of dictionaries. Divergent 在字典的最后一个列表中重复出现。 When I merged my dictionaries even the duplicates inside
https://kindle.com/-->Fantasy
got merged into one.当我合并字典时,甚至
https://kindle.com/-->Fantasy
的重复项也合并为一个。 Is there a way for me to remove the duplicate dictionary?有没有办法删除重复的字典?
I want the https://kindle.com/
part to look like:我希望
https://kindle.com/
部分看起来像:
{
"url": "https://kindle.com/",
"results": [
{
"genre": "Fantasy",
"data": [
{
"key": "Divergent",
"value": "9.0"
},
{
"key": "Lord of the Rings",
"value": "9.0"
},
{
"key": "The Handmaids Tale",
"value": "10.0"
}
]
}
}
]
}
You can try convert those dict
to a set
of tuple
first and then convert back to a list
of dict
later:您可以尝试先将这些
dict
转换为一set
tuple
,然后再转换回dict
list
:
mongo_data = [{
'url': 'https://goodreads.com/',
'variables': [{'key': 'Harry Potter', 'value': '10.0'},
{'key': 'Discovery of Witches', 'value': '8.5'},],
'vendor': 'Fantasy'
},{
'url': 'https://goodreads.com/',
'variables': [{'key': 'Hunger Games', 'value': '10.0'},
{'key': 'Maze Runner', 'value': '5.5'},],
'vendor': 'Dystopia'
},{
'url': 'https://kindle.com/',
'variables': [{'key': 'Divergent', 'value': '9.0'},
{'key': 'Lord of the Rings', 'value': '9.0'},],
'vendor': 'Fantasy'
},{
'url': 'https://kindle.com/',
'variables': [{'key': 'The Handmaids Tale', 'value': '10.0'},
{'key': 'Divergent', 'value': '9.0'},],
'vendor': 'Fantasy'
}]
from itertools import groupby
searches = []
for key, group in groupby(mongo_data, key=lambda chunk: chunk['url']):
search = {"url": key, "results": []}
for vendor, group2 in groupby(group, key=lambda chunk2: chunk2['vendor']):
result = {
"genre": vendor,
"data": set((key['key'], key['value'])
for result2 in group2
for key in result2["variables"]),
}
result['data'] = [{"key": tup[0], "value": tup[1]} for tup in result['data']]
search["results"].append(result)
searches.append(search)
searches
Output:输出:
[{'results': [{'data': [{'key': 'Harry Potter', 'value': '10.0'},
{'key': 'Discovery of Witches', 'value': '8.5'}],
'genre': 'Fantasy'},
{'data': [{'key': 'Maze Runner', 'value': '5.5'},
{'key': 'Hunger Games', 'value': '10.0'}],
'genre': 'Dystopia'}],
'url': 'https://goodreads.com/'},
{'results': [{'data': [{'key': 'The Handmaids Tale', 'value': '10.0'},
{'key': 'Lord of the Rings', 'value': '9.0'},
{'key': 'Divergent', 'value': '9.0'}],
'genre': 'Fantasy'}],
'url': 'https://kindle.com/'}]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.