比较字典列表中所有字典中的值的简单方法？

Question

假设我有一个名为 mydict 的 dicts 列表，如下所示：

[{'id': 6384,
  'character': 'Thomas A. Anderson / Neo',
  'credit_id': '52fe425bc3a36847f80181c1',
  'movie_id': 603},
 {'id': 2975,
  'character': 'Morpheus',
  'credit_id': '52fe425bc3a36847f801818d',
  'movie_id': 603},
 {'id': 530,
  'character': 'Trinity',
  'credit_id': '52fe425bc3a36847f8018191',
  'movie_id': 603},
 {'id': 1331,
  'character': 'Agent Smith',
  'credit_id': '52fe425bc3a36847f8018195',
  'movie_id': 603},
 {'id': 3165802,
  'character': 'MP Sergeant #1',
  'credit_id': '62ade87f4142910051c8e002',
  'movie_id': 28},
 {'id': 18471,
  'character': 'Self',
  'credit_id': '6259ed263acd2016291eef43',
  'movie_id': 963164},
 {'id': 74611,
  'character': 'Self',
  'credit_id': '6259ed37ecaef515ff68cae6',
  'movie_id': 963164}]

我想获得所有具有相同 mydict['movie_id'] 值的 mydict['id'] 值对 - 仅使用 Python 标准库。 本质上，返回

(6384, 2975)
(6384, 530)
(6384, 1331)
....
(18471, 74611)

像这样循环遍历所有可能的组合似乎是可能的，但速度很慢。

results=[]
for i in mydict:
    for j in mydict:
        current = i['movie_id'] 
        next = j['movie_id']
    if current==next:
        results.append(i['id'], j['id'])

有没有一种字典理解方法可以达到相同的结果？

Answer 1

考虑使用collections.defaultdict()按movie_id 。 然后使用itertools.combinations()对它们进行成对循环：

from collections import defaultdict
from itertools import combinations

d = defaultdict(list)
for movie in credits:
    d[movie['movie_id']].append(movie['id'])

for group in d.values():
    for pair in combinations(group, 2):
        print(pair)

对于给定的数据集，输出：

(6384, 2975)
(6384, 530)
(6384, 1331)
(2975, 530)
(2975, 1331)
(530, 1331)
(18471, 74611)

Answer 2

一个简单易懂的解决方案是使用 pandas 库来执行此操作。

import pandas as pd
my_data = mydict
df = pd.DataFrame.from_dict(my_data)
print(
    df[ df['id'] == df['movie_id'] ]
    )

这应该可以。

Answer 3

您可以使用groupby和combinations ，

使用groupby时，预计类似的movie_id会一起出现在主列表中，否则您必须使用movie_id对主列表进行排序。

In [18]: from itertools import groupby

In [19]: from itertools import combinations

In [20]: for k,l in groupby(mydict, key=lambda x:x['movie_id']):
    ...:     print(list(combinations([i.get('id') for i in l], 2)))
    ...: 
[(6384, 2975), (6384, 530), (6384, 1331), (2975, 530), (2975, 1331), (530, 1331)]
[]
[(18471, 74611)]

Answer 4

使用pandas ：

#lst is your list of dicts
out = pd.DataFrame(lst).groupby('movie_id')['id'].apply(
    lambda x: list(itertools.combinations(x, 2))).to_dict()

使用itertools ：

out = {
        k: list(combinations([d['id'] for d in list(g)], 2))
        for k, g in groupby(lst, lambda x: x['movie_id'])
      }

打印出）：

{28: [],
 603: [(6384, 2975),
  (6384, 530),
  (6384, 1331),
  (2975, 530),
  (2975, 1331),
  (530, 1331)],
 963164: [(18471, 74611)]}

比较字典列表中所有字典中的值的简单方法？

问题描述

4 个解决方案

解决方案1
1 已采纳 2022-08-29 22:22:26

解决方案2
0 2022-08-29 22:22:13

解决方案3
0 2022-08-29 22:23:51

解决方案4
0 2022-08-29 22:41:11

比较字典列表中所有字典中的值的简单方法？

问题描述

4 个解决方案

解决方案1 1 已采纳 2022-08-29 22:22:26

解决方案2 0 2022-08-29 22:22:13

解决方案3 0 2022-08-29 22:23:51

解决方案4 0 2022-08-29 22:41:11

解决方案1
1 已采纳 2022-08-29 22:22:26

解决方案2
0 2022-08-29 22:22:13

解决方案3
0 2022-08-29 22:23:51

解决方案4
0 2022-08-29 22:41:11