在Python中搜索2个字典列表之间的常用元素的最快方法

Question

我有2个词典列表。

list1 = [{'user_id':23, 'user_name':'John', 'age':30},
         {'user_id':24, 'user_name':'Shaun', 'age':31},
         {'user_id':25, 'user_name':'Johny', 'age':32}]

list2 =[{'user_id':23},
        {'user_id':25}]

现在我想要输出

list3 = [{'user_id':23, 'user_name':'John', 'age':30},
         {'user_id':25, 'user_name':'Johny','age':32}]

我想要最有效的方法，因为我的list1可能包含数百万行。

Answer 1

你必须稍微改变list2才能获得快速查找。 我set了

list1 = [{'user_id':23, 'user_name':'John','age':30},
         {'user_id':24, 'user_name':'Shaun','age':31},
         {'user_id':25, 'user_name':'Johny','age':32}]

list2 =[{'user_id':23},
        {'user_id':25}]

list2_ids = {d['user_id'] for d in list2}

然后使用筛选列表理解构建list3 。 在这种情况下in list2_ids非常快，因为它使用了set而不是线性搜索的查找：

list3 = [x for x in list1 if x['user_id'] in list2_ids]

print(list3)

结果：

[{'user_id': 23, 'user_name': 'John', 'age': 30}, {'user_id': 25, 'user_name': 'Johny', 'age': 32}]

Answer 2

当密钥是user_id并且值是name和age时，我会将list1转换为字典。

现在，当你查看这个dict即使dict有很多元素，复杂度也是O(1) ，对于find。

在这种情况下，查找所有用户ID的整个复杂性是O(len(list2))

dict1 = {23 : {'user_name':'John', 'age':30},
         24 : {'user_name':'Shaun', 'age':31},
         25 : {'user_name':'Johny', 'age':32}}

list2 =[{'user_id':23},
        {'user_id':25}]

res = [dict1.get(user['user_id']) for user in list2 if user['user_id'] in dict1]

print (res)

>>> [{'user_name': 'John', 'age': 30}, {'user_name': 'Johny', 'age': 32}]

Answer 3

您可以使用pandas将数据框合并到一起。
1.将dict转换为数据帧
2.合并“user_id”上的两个数据帧

import pandas as pd
list1 = [{'user_id':23, 'user_name':'John', 'age':30},
          {'user_id':24, 'user_name':'Shaun', 'age':31},
          {'user_id':25, 'user_name':'Johny', 'age':32}] 
list2 =[{'user_id':23},
         {'user_id':25}] 
df1 = pd.DataFrame(list1)
df1
   age  user_id user_name
0   30       23      John
1   31       24     Shaun
2   32       25     Johny
df2 = pd.DataFrame(list2)
df2
   user_id
0       23
1       25

pd.merge(df2,df1,on='user_id')
   user_id  age user_name
0       23   30      John
1       25   32     Johny

Answer 4

像以前的海报一样，你需要从列表2中创建一个ID列表：

list2_ids = {d['user_id'] for d in list2}

完成此操作后，您还可以使用过滤功能：

filter(lambda x: x['user_id'] in list2_ids, list1)

虽然没有进行优化，但它具有多个并行计算实现的好处（如果您处理大量数据，则可能需要这些实现。

据说性能方面的最佳解决方案可能是设置交集（比较）：

unique_ids = set([d['user_id'] for d in list1]) & set([d['user_id'] for d in list2])
list3 = [x for x in list1 if x['user_id'] in unique_ids]

如果您确定列表不包含重复项，则可以忽略set 。

在Python中搜索2个字典列表之间的常用元素的最快方法

问题描述

4 个解决方案

解决方案1
5 2017-07-10 13:24:45

解决方案2
1 2017-07-10 13:32:55

解决方案3
0 2017-07-10 13:42:05

解决方案4
0 2017-07-10 13:48:14

在Python中搜索2个字典列表之间的常用元素的最快方法

问题描述

4 个解决方案

解决方案1 5 2017-07-10 13:24:45

解决方案2 1 2017-07-10 13:32:55

解决方案3 0 2017-07-10 13:42:05

解决方案4 0 2017-07-10 13:48:14

解决方案1
5 2017-07-10 13:24:45

解决方案2
1 2017-07-10 13:32:55

解决方案3
0 2017-07-10 13:42:05

解决方案4
0 2017-07-10 13:48:14