简体   繁体   English

字典集的所有组合成 K N 大小的组

[英]All combinations of set of dictionaries into K N-sized groups

I though this would be straightforward, unfortunately, it is not.我虽然这很简单,但不幸的是,事实并非如此。

I am trying to build a function to take an iterable of dictionaries (ie, a list of unique dictionaries) and return a list of lists of unique groupings of the dictionaries.我正在尝试构建一个 function 来获取一个可迭代的字典(即唯一字典列表)并返回字典的唯一分组列表列表。

If I have x players I would like to form k teams of n size.如果我有x玩家,我想组成n大小的k队。

This question and set of answers from CMSDK is the closest thing to a solution I can find. 这个问题和来自 CMSDK 的一组答案是我能找到的最接近解决方案的东西。 In adapting it from processing strings of letters to dictionaries I am finding my Python skills inadequate.在将其从处理字母字符串转换为字典时,我发现我的 Python 技能不足。

The original function that I am adapting comes from the second answer:我正在改编的原始 function 来自第二个答案:

import itertools as it
def unique_group(iterable, k, n):
    """Return an iterator, comprising groups of size `k` with combinations of size `n`."""
    # Build separate combinations of `n` characters
    groups = ("".join(i) for i in it.combinations(iterable, n))    # 'AB', 'AC', 'AD', ...
    # Build unique groups of `k` by keeping the longest sets of characters
    return (i for i in it.product(groups, repeat=k) 
                if len(set("".join(i))) == sum((map(len, i))))     # ('AB', 'CD'), ('AB', 'CE'), ... 

My current adaptation (that utterly fails with an error of TypeError: object of type 'generator' has no len() because of the call to map(len, i) ):我当前的适应(由于调用map(len, i) ,完全失败,出现TypeError: object of type 'generator' has no len() ):

def unique_group(iterable, k, n):
    groups = []
    groups.append((i for i in it.combinations(iterable, n)))
    return ( i for i in it.product(groups, repeat=k) if len(set(i)) == sum((map(len, i))) )

For a bit of context: I am trying to programmatically divide a group of players into teams for Christmas Trivia based on their skills.对于一些上下文:我正在尝试根据他们的技能以编程方式将一组玩家划分为圣诞节琐事的团队。 The list of dictionaries is formed from a yaml file that looks like字典列表由 yaml 文件组成,看起来像

- name: Patricia
  skill: 4
- name: Christopher
  skill: 6
- name: Nicholas
  skill: 7
- name: Bianca
  skill: 4

Which, after yaml.load produces a list of dictionaries:其中,在yaml.load生成字典列表之后:

players = [{'name':'Patricia', 'skill':4},{'name':'Christopher','skill':6},
           {'name':'Nicholas','skill':7},{'name':'Bianca','skill':4}]

So I expect output that would look like a list of these (where k = 2 and n = 2 ):所以我希望 output 看起来像这些列表(其中k = 2n = 2 ):

(
    # Team assignment grouping 1
    (
        # Team 1
        ( {'name': 'Patricia', 'skill': 4}, {'name': 'Christopher', 'skill': 6} ),
        # Team 2
        ( {'name': 'Nicholas', 'skill': 7}, {'name': 'Bianca', 'skill': 4} )
    ),
    # Team assignment grouping 2
    (
        # Team 1
        ( {'name': 'Patricia', 'skill': 4}, {'name': 'Bianca', 'skill': 4} ),
        # Team 2
        ( {'name': 'Nicholas', 'skill': 7}, {'name': 'Christopher', 'skill': 6} )
    ),

    ...,

    # More unique lists

)

Each team assignment grouping needs to have unique players across teams (ie, there cannot be the same player on multiple teams in a team assignment grouping), and each team assignment grouping needs to be unique.每个团队分配分组需要具有跨团队的唯一球员(即,一个团队分配分组中的多个团队不能有相同的球员),并且每个团队分配分组需要是唯一的。

Once I have the list of team assignment combinations I will sum up the skills in every group, take the difference between the highest skill and lowest skill, and choose the grouping (with variance) with the lowest difference between highest and lowest skills.获得团队分配组合列表后,我将总结每个组的技能,取最高技能和最低技能之间的差异,然后选择最高技能和最低技能之间差异最小的分组(有差异)。

I will admit I do not understand this code fully.我承认我不完全理解这段代码。 I understand the first assignment to create a list of all the combinations of the letters in a string, and the return statement to find the product under the condition that the product does not contain the same letter in different groups.我理解第一个任务是创建一个字符串中所有字母组合的列表,以及在产品不包含不同组中的相同字母的条件下查找产品的返回语句。

My initial attempt was to simply take the it.product(it.combinations(iterable, n), repeat=k) but this does not achieve uniqueness across groups (ie, I get the same player on different teams in one grouping).我最初的尝试是简单地采用it.product(it.combinations(iterable, n), repeat=k)但这并不能实现跨组的唯一性(即,我在一个分组中的不同团队中获得相同的球员)。

Thanks in advance, and Merry Christmas!提前致谢,圣诞快乐!


Update:更新:

After a considerable amount of fiddling I have gotten the adaptation to this:经过大量的摆弄,我已经适应了这个:

This does not work这不起作用

def unique_group(iterable, k, n):
    groups = []
    groups.append((i for i in it.combinations(iterable, n)))
    return (i for i in it.product(groups, repeat=k)\
        if len(list({v['name']:v for v in it.chain.from_iterable(i)}.values())) ==\
        len(list([x for x in it.chain.from_iterable(i)])))

I get a bug我有一个错误

Traceback (most recent call last):
  File "./optimize.py", line 65, in <module>
    for grouping in unique_group(players, team_size, number_of_teams):
  File "./optimize.py", line 32, in <genexpr>
    v in it.chain.from_iterable(i)})) == len(list([x for x in
  File "./optimize.py", line 32, in <dictcomp>
    v in it.chain.from_iterable(i)})) == len(list([x for x in
TypeError: tuple indices must be integers or slices, not str

Which is confusing the crap out of me and makes clear I don't know what my code is doing.这让我很困惑,并且清楚地表明我不知道我的代码在做什么。 In ipython I took this sample output:在 ipython 我拿了这个样本 output:

assignment = (
({'name': 'Patricia', 'skill': 4}, {'name': 'Bianca', 'skill': 4}),
({'name': 'Patricia', 'skill': 4}, {'name': 'Bianca', 'skill': 4})
)

Which is clearly undesirable and formulated the following test:这显然是不可取的,并制定了以下测试:

len(list({v['name']:v for v in it.chain.from_iterable(assignment)})) == len([v for v in it.chain.from_iterable(assignment)])

Which correctly responds False .哪个正确响应False But it doesn't work in my method.但这在我的方法中不起作用。 That is probably because I am cargo cult coding at this point.那可能是因为我现在是货物崇拜编码。

I understand what it.chain.from_iterable(i) does (it flattens the tuple of tuples of dictionaries to just a tuple of dictionaries).我了解it.chain.from_iterable(i)的作用(它将字典元组的元组扁平化为字典元组)。 But it seems that the syntax {v['name']:v for v in...} does not do what I think it does;但似乎语法{v['name']:v for v in...}没有像我认为的那样做; either that or I'm unpacking the wrong values!要么,要么我解包错误的值! I am trying to test the unique dictionaries against the total dictionaries based on Flatten list of lists and Python - List of unique dictionaries but the answer giving me我正在尝试根据Flatten list of listsPython - List of unique dictionaries但答案给我的总字典测试唯一字典

>>> L=[
... {'id':1,'name':'john', 'age':34},
... {'id':1,'name':'john', 'age':34},
... {'id':2,'name':'hanna', 'age':30},
... ] 
>>> list({v['id']:v for v in L}.values())

Isn't as easy to adapt in this circumstance as I thought, and I'm realizing I don't really know what is getting returned in the it.product(groups, repeat=k) .在这种情况下并不像我想象的那么容易适应,而且我意识到我真的不知道it.product(groups, repeat=k)中返回了什么。 I'll have to investigate more.我将不得不进行更多调查。

This is where I'd leverage the new dataclasses with sets. 这是我将新数据集用于集合的地方。 You can make a dataclass hashable by setting frozen=True in the decorator. 您可以通过在装饰器中设置Frozen frozen=True来使数据类可哈希化。 First you'd add your players to a set to get unique players. 首先,您需要将玩家添加到集合中以获得独特的玩家。 Then you'd get all the combinations of players for n size teams. 然后,您将获得n个大小团队的所有球员组合。 Then you could create a set of unique teams. 然后,您可以创建一组独特的团队。 Then create valid groupings whereas no player is represented more than once across teams. 然后创建有效的分组,而每个团队中没有一个代表一个以上的球员。 Finally you could calculate the max disparity in the total team skill level across the grouping (leveraging combinations yet again) and use that to sort your valid groupings. 最后,您可以计算出整个分组中团队总技能水平的最大差异(再次利用组合),然后使用该差异对有效分组进行排序。 So something like this. 像这样

from dataclasses import dataclass
from itertools import combinations
from typing import FrozenSet

import yaml


@dataclass(order=True, frozen=True)
class Player:
    name: str
    skill: int


@dataclass(order=True, frozen=True)
class Team:
    members: FrozenSet[Player]

    def total_skill(self):
        return sum(p.skill for p in self.members)


def is_valid(grouping):
    players = set()
    for team in grouping:
        for player in team.members:
            if player in players:
                return False
            players.add(player)
    return True


def max_team_disparity(grouping):
    return max(
        abs(t1.total_skill() - t2.total_skill())
        for t1, t2 in combinations(grouping, 2)
    )


def best_team_matchups(player_file, k, n):
    with open(player_file) as f:
        players = set(Player(p['name'], p['skill']) for p in yaml.load(f))
    player_combs = combinations(players, n)
    unique_teams = set(Team(frozenset(team)) for team in player_combs)
    valid_groupings = set(g for g in combinations(unique_teams, k) if is_valid(g))
    for g in sorted(valid_groupings, key=max_team_disparity):
        print(g)


best_team_matchups('test.yaml', k=2, n=4)

Example output: 输出示例:

(
    Team(members=frozenset({
        Player(name='Chr', skill=6),
        Player(name='Christopher', skill=6),
        Player(name='Nicholas', skill=7),
        Player(name='Patricia', skill=4)
    })),
    Team(members=frozenset({
        Player(name='Bia', skill=4),
        Player(name='Bianca', skill=4),
        Player(name='Danny', skill=8),
        Player(name='Nicho', skill=7)
    }))
)

A list of dicts is not a good data structure for mapping what you actually want to rearrange, the player names, to their respective attributes, the skill ratings. 字典列表不是将您实际想要重新排列的内容(球员姓名)映射到其各自的属性(技能等级)的良好数据结构。 You should transform the list of dicts to a name-to-skill mapping dict first: 您应该首先将字典列表转换为从名称到技能的映射字典:

player_skills = {player['name']: player['skill'] for player in players}
# player_skills becomes {'Patricia': 4, 'Christopher': 6, 'Nicholas': 7, 'Blanca': 4}

so that you can recursively deduct a combination of n players from the pool of players iterable , until the number of groups reaches k : 这样您就可以从iterable的玩家池中递归减去n玩家的组合,直到组数达到k为止:

from itertools import combinations
def unique_group(iterable, k, n, groups=0):
    if groups == k:
        yield []
    pool = set(iterable)
    for combination in combinations(pool, n):
        for rest in unique_group(pool.difference(combination), k, n, groups + 1):
            yield [combination, *rest]

With your sample input, list(unique_group(player_skills, 2, 2)) returns: 使用示例输入, list(unique_group(player_skills, 2, 2))返回:

[[('Blanca', 'Christopher'), ('Nicholas', 'Patricia')],
 [('Blanca', 'Nicholas'), ('Christopher', 'Patricia')],
 [('Blanca', 'Patricia'), ('Christopher', 'Nicholas')],
 [('Christopher', 'Nicholas'), ('Blanca', 'Patricia')],
 [('Christopher', 'Patricia'), ('Blanca', 'Nicholas')],
 [('Nicholas', 'Patricia'), ('Blanca', 'Christopher')]]

You can get the combination with the lowest variance in total skill ratings by using the min function with a key function that returns the skill difference between the team with the highest total skill ratings and the one with the lowest, which takes only O(n) in time complexity: 您可以通过使用带有键函数的min函数来获得总技能评分差异最小的组合,该键函数返回的总技能评分最高的团队与技能得分最低的团队之间的技能差异仅需O(n)时间复杂度:

def variance(groups):
    total_skills = [sum(player_skills[player] for player in group) for group in groups]
    return max(total_skills) - min(total_skills)

so that min(unique_group(player_skills, 2, 2), key=variance) returns: 这样min(unique_group(player_skills, 2, 2), key=variance)返回:

[('Blanca', 'Nicholas'), ('Christopher', 'Patricia')]

Instead of trying to create every possible grouping of k sets of n elements (possibly including repeats,), and then filtering down to the ones that don't have any overlap.而不是尝试创建每个可能的kn元素(可能包括重复)的分组,然后过滤到没有任何重叠的那些。 let's directly build groupings that meet the criterion.让我们直接建立符合标准的分组。 This also avoids generating redundant groupings in different orders (the original code could also do this by using combinations rather than product in the last step).这也避免了以不同顺序生成冗余分组(原始代码也可以通过使用combinations而不是最后一步中的product来做到这一点)。

The approach is:方法是:

  • Iterate over possibilities (combinations of n elements in the input) for the first set - by which I mean, the one that contains the first of the elements that will be chosen.迭代第一组的可能性(输入中n元素的组合) - 我的意思是,包含将选择的第一个元素的那个。
  • For each, recursively find possibilities for the remaining sets.对于每个,递归地找到剩余集合的可能性。 They cannot use elements from the first set, and they also cannot use elements from before the first set (or else the first set wouldn't be first).他们不能使用第一个集合中的元素,也不能使用第一个集合之前的元素(否则第一个集合不会是第一个)。

In order to combine the results elegantly, we use a recursive generator : rather than trying to build lists that contain results from the recursive calls, we just yield everything we need to.为了优雅地组合结果,我们使用递归生成器:与其尝试构建包含递归调用结果的列表,我们只需yield我们需要的所有内容。 We represent each collection of group_count many elements with a tuple of tuples (the inner tuples are the groups).我们用一个元组的元组表示每个group_count元素的集合(内部元组是组)。 At the base case, there is exactly one way to make no groups of elements - by just... doing that... yeah... - so we need to yield one value which is a tuple of no tuples of an irrelevant number of elements each - ie, an empty tuple.在基本情况下,只有一种方法可以不生成任何元素组 - 只需......这样做......是的...... - 所以我们需要yield一个值,它是一个没有不相关数字的元组的元组每个元素 - 即一个空元组。 In the other cases, we prepend the tuple for the current group to each result from the recursive call, yield ing all those results.在其他情况下,我们将当前组的元组添加到递归调用的每个结果中,从而yield所有这些结果。

from itertools import combinations

def non_overlapping_groups(group_count, group_size, population):
    if group_count == 0:
        yield ()
        return
    for indices in combinations(range(len(population)), group_size):
        current = (tuple(population[i] for i in indices),)
        remaining = [
            x for i, x in enumerate(population)
            if i not in indices and i > indices[0]
        ] if indices else population
        for recursive in non_overlapping_groups(group_count - 1, group_size, remaining):
            yield current + recursive

Let's try it:让我们尝试一下:

>>> list(non_overlapping_groups(2, 3, 'abcdef'))
[(('a', 'b', 'c'), ('d', 'e', 'f')), (('a', 'b', 'd'), ('c', 'e', 'f')), (('a', 'b', 'e'), ('c', 'd', 'f')), (('a', 'b', 'f'), ('c', 'd', 'e')), (('a', 'c', 'd'), ('b', 'e', 'f')), (('a', 'c', 'e'), ('b', 'd', 'f')), (('a', 'c', 'f'), ('b', 'd', 'e')), (('a', 'd', 'e'), ('b', 'c', 'f')), (('a', 'd', 'f'), ('b', 'c', 'e')), (('a', 'e', 'f'), ('b', 'c', 'd'))]
>>> list(non_overlapping_groups(3, 2, 'abcdef'))
[(('a', 'b'), ('c', 'd'), ('e', 'f')), (('a', 'b'), ('c', 'e'), ('d', 'f')), (('a', 'b'), ('c', 'f'), ('d', 'e')), (('a', 'c'), ('b', 'd'), ('e', 'f')), (('a', 'c'), ('b', 'e'), ('d', 'f')), (('a', 'c'), ('b', 'f'), ('d', 'e')), (('a', 'd'), ('b', 'c'), ('e', 'f')), (('a', 'd'), ('b', 'e'), ('c', 'f')), (('a', 'd'), ('b', 'f'), ('c', 'e')), (('a', 'e'), ('b', 'c'), ('d', 'f')), (('a', 'e'), ('b', 'd'), ('c', 'f')), (('a', 'e'), ('b', 'f'), ('c', 'd')), (('a', 'f'), ('b', 'c'), ('d', 'e')), (('a', 'f'), ('b', 'd'), ('c', 'e')), (('a', 'f'), ('b', 'e'), ('c', 'd'))]
>>> # Some quick sanity checks
>>> len(list(non_overlapping_groups(2, 3, 'abcdef')))
10
>>> # With fewer input elements, obviously we can't do it.
>>> len(list(non_overlapping_groups(2, 3, 'abcde')))
0
>>> # Adding a 7th element, any element could be the odd one out,
>>> # and in each case we get another 10 possibilities, making 10 * 7 = 70.
>>> len(list(non_overlapping_groups(2, 3, 'abcdefg')))
70

I performance tested this against a modified version of the original (which also shows how to make it work properly with non-strings, and optimizes the sum calculation):我针对原始版本的修改版本对此进行了性能测试(还展示了如何使其与非字符串一起正常工作,并优化sum计算):

def unique_group(group_count, group_size, population):
    groups = list(it.combinations(population, group_size))
    return (
        i for i in combinations(groups, group_count) 
        if len({e for g in i for e in g}) == group_count * group_size
    )

Quickly verifying the equivalence:快速验证等价性:

>>> len(list(unique_group(3, 2, 'abcdef')))
15
>>> len(list(non_overlapping_groups(3, 2, 'abcdef')))
15
>>> set(unique_group(3, 2, 'abcdef')) == set(non_overlapping_groups(3, 2, 'abcdef'))
True

We see that even for fairly small examples (here, the output has 280 groupings), the brute-force approach has to filter through a lot :我们看到,即使对于相当小的示例(这里,output 有 280 个分组),蛮力方法也必须过滤很多

>>> import timeit
>>> timeit.timeit("list(g(3, 3, 'abcdefghi'))", globals={'g': unique_group}, number=100)
5.895461600041017
>>> timeit.timeit("list(g(3, 3, 'abcdefghi'))", globals={'g': non_overlapping_groups}, number=100)
0.2303082060534507

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 生成在每个元素上遵循特定条件的n尺寸向量的所有可能组合 - Generating all possible combinations of n-sized vector that follow certain conditions on each element 在元组列表中查找所有常见的N大小元组 - Find all common N-sized tuples in list of tuples 如何使用 n 大小的 window 遍历列表并对匹配和不匹配的元素集进行操作? - How to iterate over a list with a n-sized window and operate on matched and unmatched set of elements? Python在字符串中的短语周围找到n大小的窗口 - Python find n-sized window around phrase within string 找出大小至少为 k 到 n 的所有组合 - Find all combinations of at least size k to n 使大小为k的所有组合从1到数字n - Make all combinations of size k starting from 1 to number n 我需要返回 [1,n] 元素中 k 个数字的所有可能组合 - I need to return all the possible combinations of k numbers in [1,n] elements 使用 Python bitset 找到一组大小为 k 的所有组合 - Using Python bitset to find all combinations of a set of size k 生成包含3个词典的所有组合的词典列表 - Generate a list of dictionaries with all combinations from 3 dictionaries 查找N组中N个项目的所有组合,而不重复项目组合(python)? - Find all combinations of N items in N groups without duplicates of item combos (python)?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM