简体   繁体   English

字典列表,按列表键分组,没有交集

[英]List of dicts, group without intersection by list keys

I need help to optimize my code.我需要帮助来优化我的代码。

I have a data:我有一个数据:

data = [
  {"ids": [1]},
  {"ids": [3, 4]},
  {"ids": [1, 2]},
  {"ids": [2]},
]

and I need to group it without intersection by ids, so expected data should be:我需要将它分组而不按 ids 交叉,所以预期的数据应该是:

expected = [
  [{"ids": [1]}, {"ids": [2]}],
  [{"ids": [3, 4]}, {"ids": [1, 2]}],
]  # only 2 sublist here

My code to split(not optimized):我要拆分的代码(未优化):

import itertools as it

def _split(
    list_of_dicts,
):
    splitted_list_of_dicts = []
    sub_list = []
    while list_of_dicts:
        for dct in list_of_dicts:
            ids_in_sub_list = set(
                it.chain(*[sub_list_el["ids"] for sub_list_el in sub_list]),
            )
            if not set(dct["ids"]).intersection(ids_in_sub_list):
                sub_list.append(dct)
                list_of_dicts.remove(dct)
        splitted_list_of_dicts.append(sub_list)
        sub_list = []
    return splitted_list_of_dicts

The result of my code is:我的代码的结果是:

result = [
    [{'ids': [1]}, {'ids': [2]}],
    [{'ids': [3, 4]}],
    [{'ids': [1, 2]}]
]  # 3 sublist

I get one more list, which I try to optimize.我又得到了一份清单,我试图对其进行优化。 If you have any ideas on how to help me, I'll be happy, thanks for your time.如果您对如何帮助我有任何想法,我会很高兴,感谢您的宝贵时间。

More examples:更多示例:

data = [
  {"ids": [1]},
  {"ids": [3, 4]},
  {"ids": [1, 2]},
  {"ids": [4]},
  {"ids": [3]},
  {"ids": [2]},
]

can be grouped as 2 elements list:可以分组为 2 个元素列表:

expected = [
    [{'ids': [1]}, {'ids': [4]}, {'ids': [2]}, {'ids': [3]}],
    [{'ids': [3, 4]}, {'ids': [1, 2]}],
]

but now I got all 4:但现在我得到了全部 4 个:

result = [
    [{'ids': [1]}, {'ids': [4]}, {'ids': [2]}],
    [{'ids': [3, 4]}],
    [{'ids': [1, 2]}],
    [{'ids': [3]}]
]

If any combination that doesn't contain duplicates is acceptable, you could simply iterate over the data list and append the current element to the first element in the result where none of the ids already exist.如果可以接受任何不包含重复项的组合,您可以简单地遍历data列表和 append 将当前元素迭代到结果中不存在任何 id 的第一个元素。

def split(list_of_dicts):
    result_helper = [set()] # This will be a list of sets for easy membership checks
    result_list = [[]] # This will be what we return
    for d in list_of_dicts:
        for s, l, in zip(result_helper, result_list):
            if not any(x in s for x in d["ids"]):
                s.update(d["ids"])
                l.append(d)
                break
        else:
            # for loop ended without being broken
            # This means no elements of result_list took this dict item. 
            # So create a new element
            result_list.append([d])
            result_helper.append(set(d["ids"]))
    return result_list

With your original data,使用您的原始数据,

data = [
  {"ids": [1]},
  {"ids": [3, 4]},
  {"ids": [1, 2]},
  {"ids": [2]},
]
split(data)

we get the output:我们得到 output:

 [
    [{'ids': [1]}, {'ids': [3, 4]}, {'ids': [2]}],
    [{'ids': [1, 2]}]
 ]

which seems to be an acceptable solution because none of the lists have a duplicated id.这似乎是一个可以接受的解决方案,因为所有列表都没有重复的 id。

And with the second example:第二个例子:

data = [
  {"ids": [1]},
  {"ids": [3, 4]},
  {"ids": [1, 2]},
  {"ids": [4]},
  {"ids": [3]},
  {"ids": [2]},
]
split(data)

This gives the output:这给出了 output:

 [
    [{'ids': [1]}, {'ids': [3, 4]}, {'ids': [2]}],
    [{'ids': [1, 2]}, {'ids': [4]}, {'ids': [3]}]
 ]

No duplicates in this case either.在这种情况下也没有重复。

As far as I can tell from your question, you are essentially sorting the ids on each group's cardinality.据我从你的问题中可以看出,你基本上是在对每个组的基数进行排序。

from itertools import groupby


def transform(data):
    cardinality = lambda x: len(x['ids'])
    sorted_data = sorted(data, key=cardinality)
    return [list(group) for _, group in groupby(sorted_data, key=cardinality)]

Giving:给予:

[
    [
        {'ids': [1]},
        {'ids': [4]},
        {'ids': [3]},
        {'ids': [2]}
    ],
    [
        {'ids': [3, 4]},
        {'ids': [1, 2]}
    ]
]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM