根据2个键在词典列表中删除重复的词典

Question

I have a list of dictionaries, like this: 我有一个字典列表，像这样：

   my_list = [{'key1':'1', 'date':'2015-01-09'}, {'key1':'3', 'date':'2015-01-09'}, {'key1':'1', 'date':'2014-03-19'}, \
       {'key1':'4', 'date':'2015-05-09'} ,...]

In some of the dictionaries, the value of key1 are repeated and I want to remove them from the list based on date (another key of the dictionary) and keep only the dictionaries that have the earliest date. 在某些词典中，重复了key1的值，我想根据日期（字典的另一个键）从列表中将其删除，并仅保留日期最早的词典。 Result: 结果：

    my_list = [{'key1':'3', 'date':'2015-01-09'}, {'key1':'1', 'date':'2014-03-19'}, {'key1':'4', 'date':'2015-05-09'} ,...]

Performance is important. 性能很重要。

Answer 1

I would rebuild a dictionary with key1 as key in a dictionary comprehension, using sorted values (reversed) so earliest date are returned last, overwriting same keys: only the earliest date remains: 我将使用排序后的值（反转）重建带有key1作为字典理解中的键的字典，以使最早的日期最后一次返回，并覆盖相同的键：仅保留最早的日期：

my_list = [{'key1':'1', 'date':'2015-01-09'}, {'key1':'3', 'date':'2015-01-09'}, {'key1':'1', 'date':'2014-03-19'}, \
       {'key1':'4', 'date':'2015-05-09'}]

my_dict = {d["key1"]:d for d in sorted(my_list,key=lambda l:l["date"],reverse=True)}

print(list(my_dict.values()))

result (I supposed that ordering didn't matter, or else I cannot use the dictionary, since order is not preserved): 结果（我认为排序没关系，否则我不能使用字典，因为不保留顺序）：

[{'key1': '1', 'date': '2014-03-19'}, {'key1': '3', 'date': '2015-01-09'}, {'key1': '4', 'date': '2015-05-09'}]

(note that sorting the dates with lexicographical order is OK because they're YYYY-MM-DD format and it makes things easier: no need to parse the dates) （请注意，按字典顺序对日期进行排序是可以的，因为它们是YYYY-MM-DD格式，这使事情变得更加容易：无需解析日期）

An alternate solution if you're short in memory would be to avoid the sorting part because it creates a sorted copy of the list beforehand (doesn't duplicate the data, but still, it can eat some memory). 如果您的内存不足，另一种解决方案是避免使用排序部分，因为它会事先创建列表的排序后的副本（不复制数据，但仍然会占用一些内存）。

In that case, a classical loop will do, slower but less memory-hungry (and no sorting needed). 在那种情况下，经典的循环会做得更慢，但是却减少了内存消耗（并且不需要排序）。 Using get with a default value to return 'A' when the key isn't in the destination dictionary to force insertion ( A ranks higher than any digit). 当键不在目标字典中时，使用默认值get来返回'A'以强制插入（ A位数高于任何数字）。

my_dict = {}

for l in my_list:
    k = l['key1']
    d = l['date']

    if my_dict.get(k,'A') > d:
        my_dict[k] = d

Answer 2

Both of the answers work, I think though when I was a real beginner I would have preferred something a little simpler. 这两个答案都有效，我想虽然我是一个真正的初学者，但我宁愿选择一些简单的方法。 What I would do is similar to @Jean_Francois's answer but I think is a bit simpler ( though it has more lines of code) 我会做的事情与@Jean_Francois的答案类似，但我认为这要简单一些（尽管它具有更多的代码行）

I would build a dictionary from the list and as I add to it I would check the date. 我会从列表中构建字典，并添加到字典中，然后检查日期。 The data checking is easy as he noted 正如他指出的那样，数据检查很容易

from collections import defaultdict
min_date_dict = defaultdict(dict)
for item_date in my_list:
    key = item_date['key1']
    date = item_date['date']
    if key in min_date:
          if min_date[key]['date'] > date:
                min_date[key] = item_date
    else:
       min_date[key] = item_date

This transformation places your items into a dictionary with the key as the value of key1 此转换将您的项目放入以key为key1值的字典中

defaultdict(<type 'dict'>, {'1': {'date': '2014-03-19', 'key1': '1'}, '3': {'date': '2015-01-09', 'key1': '3'}, '4': {'date': '2015-05-09', 'key1': '4'}})

now to put it back into a list 现在将其放回列表中

item_date_list = min_date.values()

Answer 3

import pandas as pd

list(pd.DataFrame(my_list).sort_values(by='date').drop_duplicates(subset=['key1'], keep='first').apply(lambda s: s.to_dict(), axis=1).values)

Answer 4

Here is a more verbose way to do it 这是一种更详细的方法

my_list = [{'key1':'1', 'date':'2015-01-09'}, 
           {'key1':'3', 'date':'2015-01-09'}, 
           {'key1':'1', 'date':'2014-03-19'},
           {'key1':'4', 'date':'2015-05-09'}]

mins = {}
for i, d in enumerate(my_list):
    if d['key1'] not in mins or mins[d['key1']]['date'] > d['date']:
            mins[d['key1']] = {'date': d['date'], 'ind': i}

indices = sorted([d['ind'] for d in mins.values()])
filtered = [my_list[i] for i in indices]
print(filtered)

Answer 5

your can use itertools groupby to group by keys and then takes the minimum date for each group. 您可以使用itertools groupby来按键进行分组，然后为每个组获取最少的日期。 see example below 见下面的例子

final_list = [min(list(g), key = lambda x: x['date']) for k, g in groupby(sorted(my_list, key=lambda x: x['key1']), lambda x: x['key1'])]

results in 结果是

[{'date': '2014-03-19', 'key1': '1'}, {'date': '2015-01-09', 'key1': '3'}, {'date': '2015-05-09', 'key1': '4'}]

根据2个键在词典列表中删除重复的词典

问题描述

5 个解决方案

解决方案1
5 2017-05-01 20:07:48

解决方案2
1 2017-05-01 20:47:04

解决方案3
0 2017-05-01 20:11:07

解决方案4
0 2017-05-01 20:15:32

解决方案5
0 2017-05-01 20:18:57

根据2个键在词典列表中删除重复的词典

问题描述

5 个解决方案

解决方案1 5 2017-05-01 20:07:48

解决方案2 1 2017-05-01 20:47:04

解决方案3 0 2017-05-01 20:11:07

解决方案4 0 2017-05-01 20:15:32

解决方案5 0 2017-05-01 20:18:57

解决方案1
5 2017-05-01 20:07:48

解决方案2
1 2017-05-01 20:47:04

解决方案3
0 2017-05-01 20:11:07

解决方案4
0 2017-05-01 20:15:32

解决方案5
0 2017-05-01 20:18:57