简体   繁体   English

多个词典列表的笛卡尔积

[英]Cartesian product of multiple lists of dictionaries

I have two or more dictionaries and each of them is a list of dictionaries (something like json format), for example: 我有两个或更多的字典,每个字典都是字典列表(类似于json格式),例如:

list_1 = [{'Name': 'John' , 'Age': 25} , {'Name': 'Mary' , 'Age': 15}]
list_2 = [{'Product': 'Car', 'Id': 1} , {'Product': 'TV' , 'Id': 2}]
cartesian_product(list_1 * list_2) = [{'Name': 'John', 'Age':25, 'Product': 'Car', 'Id': 1}, {'Name': 'John', 'Age':25, 'Product': 'TV', 'Id': 2}, {'Name': 'Mary' , 'Age': 15, 'Product': 'Car', 'Id': 1}, {'Name': 'Mary' , 'Age': 15, 'Product': 'TV', 'Id': 2}]

How can I do this and be efficient with memory use? 我怎样才能做到这一点,并在使用内存时高效? The way i'm doing it right now runs out of RAM with big lists. 我现在正在这样做的方式是用大量列表的RAM。 I know it's probably something with itertools.product , but i couldn't figure out how to do this with a list of dicts. 我知道它可能与itertools.product有关,但我无法弄清楚如何用一个dicts列表来做这件事。 Thank you. 谢谢。

PD: I'm doing it this way for the moment: PD:我现在这样做:

gen1 = (row for row in self.tables[0])
table = []
for row in gen1:
    gen2 = (dictionary for table in self.tables[1:] for dictionary in table)
    for element in gen2:
         new_row = {}
         new_row.update(row)
         new_row.update(element)
         table.append(new_row)

Thank you! 谢谢!

Here is a solution to the problem posted: 以下是发布问题的解决方案:

list_1 = [{'Name': 'John' , 'Age': 25} , {'Name': 'Mary' , 'Age': 15}]
list_2 = [{'Product': 'Car', 'Id': 1} , {'Product': 'TV' , 'Id': 2}]


from itertools import product
ret_list = []
for i1, i2 in product(list_1, list_2):
    merged = {}
    merged.update(i1)
    merged.update(i2)
    ret_list.append(merged)

The key here is to make use of the update functionality of dicts to add members. 这里的关键是利用dictsupdate功能来添加成员。 This version will leave the parent dicts unmodified. 此版本将保留父级dicts未修改。 and will silently drop duplicate keys in favor of whatever is seen last. 并将默默地删除重复键,以支持最后看到的任何内容。

However, this will not help with memory usage. 但是,这对内存使用没有帮助。 The simple fact is that if you want to do this operation in memory you will need to be able to store the starting lists and the resulting product. 简单的事实是,如果要在内存中执行此操作,则需要能够存储起始列表和生成的产品。 Alternatives include periodically writing to disk or breaking the starting data into chunks and deleting chunks as you go. 替代方案包括定期写入磁盘或将起始数据分成块并随时删除块。

Just convert the dictionaries to lists, take the product, and back to dictionaries again: 只需将字典转换为列表,获取产品,然后再返回字典:

import itertools

list_1 = [{'Name': 'John' , 'Age': 25} , {'Name': 'Mary' , 'Age': 15}]
list_2 = [{'Product': 'Car', 'Id': 1} , {'Product': 'TV' , 'Id': 2}]
l1 = [l.items() for l in list_1]
l2 = [l.items() for l in list_2]
print [dict(l[0] + l[1]) for l in itertools.product(l1, l2)]

The output is: 输出是:

[{'Age': 25, 'Id': 1, 'Name': 'John', 'Product': 'Car'}, {'Age': 25, 'Id': 2, 'Name': 'John', 'Product': 'TV'}, {'Age': 15, 'Id': 1, 'Name': 'Mary', 'Product': 'Car'}, {'Age': 15, 'Id': 2, 'Name': 'Mary', 'Product': 'TV'}] [{'年龄':25,'Id':1,'姓名':'约翰','产品':'汽车'},{'年龄':25,'身份':2,'姓名':'约翰','产品':'电视'},{'年龄':15,'Id':1,'姓名':'玛丽','产品':'汽车'},{'年龄':15,'我的':2,'姓名':'玛丽','产品':'电视'}]

If this isn't memory-efficient enough for you, then try: 如果这对你来说不够内存,那么试试:

for l in itertools.product(l1.iteritems() for l1 in list_1,
                           l2.iteritems() for l2 in list_2):
    # work with one product at a time

For Python 3: 对于Python 3:

import itertools

list_1 = [{'Name': 'John' , 'Age': 25} , {'Name': 'Mary' , 'Age': 15}]
list_2 = [{'Product': 'Car', 'Id': 1} , {'Product': 'TV' , 'Id': 2}]
print ([{**l[0], **l[1]} for l in itertools.product(list_1, list_2)])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM