[英]All combinations of set of dictionaries into K N-sized groups
I though this would be straightforward, unfortunately, it is not.我虽然这很简单,但不幸的是,事实并非如此。
I am trying to build a function to take an iterable of dictionaries (ie, a list of unique dictionaries) and return a list of lists of unique groupings of the dictionaries.我正在尝试构建一个 function 来获取一个可迭代的字典(即唯一字典列表)并返回字典的唯一分组列表列表。
x
players I would like to form k
teams of n
size.x
玩家,我想组成n
大小的k
队。 This question and set of answers from CMSDK is the closest thing to a solution I can find. 这个问题和来自 CMSDK 的一组答案是我能找到的最接近解决方案的东西。 In adapting it from processing strings of letters to dictionaries I am finding my Python skills inadequate.
在将其从处理字母字符串转换为字典时,我发现我的 Python 技能不足。
The original function that I am adapting comes from the second answer:我正在改编的原始 function 来自第二个答案:
import itertools as it
def unique_group(iterable, k, n):
"""Return an iterator, comprising groups of size `k` with combinations of size `n`."""
# Build separate combinations of `n` characters
groups = ("".join(i) for i in it.combinations(iterable, n)) # 'AB', 'AC', 'AD', ...
# Build unique groups of `k` by keeping the longest sets of characters
return (i for i in it.product(groups, repeat=k)
if len(set("".join(i))) == sum((map(len, i)))) # ('AB', 'CD'), ('AB', 'CE'), ...
My current adaptation (that utterly fails with an error of TypeError: object of type 'generator' has no len()
because of the call to map(len, i)
):我当前的适应(由于调用
map(len, i)
,完全失败,出现TypeError: object of type 'generator' has no len()
):
def unique_group(iterable, k, n):
groups = []
groups.append((i for i in it.combinations(iterable, n)))
return ( i for i in it.product(groups, repeat=k) if len(set(i)) == sum((map(len, i))) )
For a bit of context: I am trying to programmatically divide a group of players into teams for Christmas Trivia based on their skills.对于一些上下文:我正在尝试根据他们的技能以编程方式将一组玩家划分为圣诞节琐事的团队。 The list of dictionaries is formed from a yaml file that looks like
字典列表由 yaml 文件组成,看起来像
- name: Patricia
skill: 4
- name: Christopher
skill: 6
- name: Nicholas
skill: 7
- name: Bianca
skill: 4
Which, after yaml.load
produces a list of dictionaries:其中,在
yaml.load
生成字典列表之后:
players = [{'name':'Patricia', 'skill':4},{'name':'Christopher','skill':6},
{'name':'Nicholas','skill':7},{'name':'Bianca','skill':4}]
So I expect output that would look like a list of these (where k = 2
and n = 2
):所以我希望 output 看起来像这些列表(其中
k = 2
和n = 2
):
(
# Team assignment grouping 1
(
# Team 1
( {'name': 'Patricia', 'skill': 4}, {'name': 'Christopher', 'skill': 6} ),
# Team 2
( {'name': 'Nicholas', 'skill': 7}, {'name': 'Bianca', 'skill': 4} )
),
# Team assignment grouping 2
(
# Team 1
( {'name': 'Patricia', 'skill': 4}, {'name': 'Bianca', 'skill': 4} ),
# Team 2
( {'name': 'Nicholas', 'skill': 7}, {'name': 'Christopher', 'skill': 6} )
),
...,
# More unique lists
)
Each team assignment grouping needs to have unique players across teams (ie, there cannot be the same player on multiple teams in a team assignment grouping), and each team assignment grouping needs to be unique.每个团队分配分组需要具有跨团队的唯一球员(即,一个团队分配分组中的多个团队不能有相同的球员),并且每个团队分配分组需要是唯一的。
Once I have the list of team assignment combinations I will sum up the skills in every group, take the difference between the highest skill and lowest skill, and choose the grouping (with variance) with the lowest difference between highest and lowest skills.获得团队分配组合列表后,我将总结每个组的技能,取最高技能和最低技能之间的差异,然后选择最高技能和最低技能之间差异最小的分组(有差异)。
I will admit I do not understand this code fully.我承认我不完全理解这段代码。 I understand the first assignment to create a list of all the combinations of the letters in a string, and the return statement to find the product under the condition that the product does not contain the same letter in different groups.
我理解第一个任务是创建一个字符串中所有字母组合的列表,以及在产品不包含不同组中的相同字母的条件下查找产品的返回语句。
My initial attempt was to simply take the it.product(it.combinations(iterable, n), repeat=k)
but this does not achieve uniqueness across groups (ie, I get the same player on different teams in one grouping).我最初的尝试是简单地采用
it.product(it.combinations(iterable, n), repeat=k)
但这并不能实现跨组的唯一性(即,我在一个分组中的不同团队中获得相同的球员)。
Thanks in advance, and Merry Christmas!提前致谢,圣诞快乐!
After a considerable amount of fiddling I have gotten the adaptation to this:经过大量的摆弄,我已经适应了这个:
This does not work这不起作用
def unique_group(iterable, k, n):
groups = []
groups.append((i for i in it.combinations(iterable, n)))
return (i for i in it.product(groups, repeat=k)\
if len(list({v['name']:v for v in it.chain.from_iterable(i)}.values())) ==\
len(list([x for x in it.chain.from_iterable(i)])))
I get a bug我有一个错误
Traceback (most recent call last):
File "./optimize.py", line 65, in <module>
for grouping in unique_group(players, team_size, number_of_teams):
File "./optimize.py", line 32, in <genexpr>
v in it.chain.from_iterable(i)})) == len(list([x for x in
File "./optimize.py", line 32, in <dictcomp>
v in it.chain.from_iterable(i)})) == len(list([x for x in
TypeError: tuple indices must be integers or slices, not str
Which is confusing the crap out of me and makes clear I don't know what my code is doing.这让我很困惑,并且清楚地表明我不知道我的代码在做什么。 In ipython I took this sample output:
在 ipython 我拿了这个样本 output:
assignment = (
({'name': 'Patricia', 'skill': 4}, {'name': 'Bianca', 'skill': 4}),
({'name': 'Patricia', 'skill': 4}, {'name': 'Bianca', 'skill': 4})
)
Which is clearly undesirable and formulated the following test:这显然是不可取的,并制定了以下测试:
len(list({v['name']:v for v in it.chain.from_iterable(assignment)})) == len([v for v in it.chain.from_iterable(assignment)])
Which correctly responds False
.哪个正确响应
False
。 But it doesn't work in my method.但这在我的方法中不起作用。 That is probably because I am cargo cult coding at this point.
那可能是因为我现在是货物崇拜编码。
I understand what it.chain.from_iterable(i)
does (it flattens the tuple of tuples of dictionaries to just a tuple of dictionaries).我了解
it.chain.from_iterable(i)
的作用(它将字典元组的元组扁平化为字典元组)。 But it seems that the syntax {v['name']:v for v in...}
does not do what I think it does;但似乎语法
{v['name']:v for v in...}
并没有像我认为的那样做; either that or I'm unpacking the wrong values!要么,要么我解包错误的值! I am trying to test the unique dictionaries against the total dictionaries based on Flatten list of lists and Python - List of unique dictionaries but the answer giving me
我正在尝试根据Flatten list of lists和Python - List of unique dictionaries但答案给我的总字典测试唯一字典
>>> L=[
... {'id':1,'name':'john', 'age':34},
... {'id':1,'name':'john', 'age':34},
... {'id':2,'name':'hanna', 'age':30},
... ]
>>> list({v['id']:v for v in L}.values())
Isn't as easy to adapt in this circumstance as I thought, and I'm realizing I don't really know what is getting returned in the it.product(groups, repeat=k)
.在这种情况下并不像我想象的那么容易适应,而且我意识到我真的不知道
it.product(groups, repeat=k)
中返回了什么。 I'll have to investigate more.我将不得不进行更多调查。
This is where I'd leverage the new dataclasses with sets. 这是我将新数据集用于集合的地方。 You can make a dataclass hashable by setting
frozen=True
in the decorator. 您可以通过在装饰器中设置Frozen
frozen=True
来使数据类可哈希化。 First you'd add your players to a set to get unique players. 首先,您需要将玩家添加到集合中以获得独特的玩家。 Then you'd get all the combinations of players for n size teams.
然后,您将获得n个大小团队的所有球员组合。 Then you could create a set of unique teams.
然后,您可以创建一组独特的团队。 Then create valid groupings whereas no player is represented more than once across teams.
然后创建有效的分组,而每个团队中没有一个代表一个以上的球员。 Finally you could calculate the max disparity in the total team skill level across the grouping (leveraging combinations yet again) and use that to sort your valid groupings.
最后,您可以计算出整个分组中团队总技能水平的最大差异(再次利用组合),然后使用该差异对有效分组进行排序。 So something like this.
像这样
from dataclasses import dataclass
from itertools import combinations
from typing import FrozenSet
import yaml
@dataclass(order=True, frozen=True)
class Player:
name: str
skill: int
@dataclass(order=True, frozen=True)
class Team:
members: FrozenSet[Player]
def total_skill(self):
return sum(p.skill for p in self.members)
def is_valid(grouping):
players = set()
for team in grouping:
for player in team.members:
if player in players:
return False
players.add(player)
return True
def max_team_disparity(grouping):
return max(
abs(t1.total_skill() - t2.total_skill())
for t1, t2 in combinations(grouping, 2)
)
def best_team_matchups(player_file, k, n):
with open(player_file) as f:
players = set(Player(p['name'], p['skill']) for p in yaml.load(f))
player_combs = combinations(players, n)
unique_teams = set(Team(frozenset(team)) for team in player_combs)
valid_groupings = set(g for g in combinations(unique_teams, k) if is_valid(g))
for g in sorted(valid_groupings, key=max_team_disparity):
print(g)
best_team_matchups('test.yaml', k=2, n=4)
Example output: 输出示例:
(
Team(members=frozenset({
Player(name='Chr', skill=6),
Player(name='Christopher', skill=6),
Player(name='Nicholas', skill=7),
Player(name='Patricia', skill=4)
})),
Team(members=frozenset({
Player(name='Bia', skill=4),
Player(name='Bianca', skill=4),
Player(name='Danny', skill=8),
Player(name='Nicho', skill=7)
}))
)
A list of dicts is not a good data structure for mapping what you actually want to rearrange, the player names, to their respective attributes, the skill ratings. 字典列表不是将您实际想要重新排列的内容(球员姓名)映射到其各自的属性(技能等级)的良好数据结构。 You should transform the list of dicts to a name-to-skill mapping dict first:
您应该首先将字典列表转换为从名称到技能的映射字典:
player_skills = {player['name']: player['skill'] for player in players}
# player_skills becomes {'Patricia': 4, 'Christopher': 6, 'Nicholas': 7, 'Blanca': 4}
so that you can recursively deduct a combination of n
players from the pool of players iterable
, until the number of groups reaches k
: 这样您就可以从
iterable
的玩家池中递归减去n
玩家的组合,直到组数达到k
为止:
from itertools import combinations
def unique_group(iterable, k, n, groups=0):
if groups == k:
yield []
pool = set(iterable)
for combination in combinations(pool, n):
for rest in unique_group(pool.difference(combination), k, n, groups + 1):
yield [combination, *rest]
With your sample input, list(unique_group(player_skills, 2, 2))
returns: 使用示例输入,
list(unique_group(player_skills, 2, 2))
返回:
[[('Blanca', 'Christopher'), ('Nicholas', 'Patricia')],
[('Blanca', 'Nicholas'), ('Christopher', 'Patricia')],
[('Blanca', 'Patricia'), ('Christopher', 'Nicholas')],
[('Christopher', 'Nicholas'), ('Blanca', 'Patricia')],
[('Christopher', 'Patricia'), ('Blanca', 'Nicholas')],
[('Nicholas', 'Patricia'), ('Blanca', 'Christopher')]]
You can get the combination with the lowest variance in total skill ratings by using the min
function with a key function that returns the skill difference between the team with the highest total skill ratings and the one with the lowest, which takes only O(n) in time complexity: 您可以通过使用带有键函数的
min
函数来获得总技能评分差异最小的组合,该键函数返回的总技能评分最高的团队与技能得分最低的团队之间的技能差异仅需O(n)时间复杂度:
def variance(groups):
total_skills = [sum(player_skills[player] for player in group) for group in groups]
return max(total_skills) - min(total_skills)
so that min(unique_group(player_skills, 2, 2), key=variance)
returns: 这样
min(unique_group(player_skills, 2, 2), key=variance)
返回:
[('Blanca', 'Nicholas'), ('Christopher', 'Patricia')]
Instead of trying to create every possible grouping of k
sets of n
elements (possibly including repeats,), and then filtering down to the ones that don't have any overlap.而不是尝试创建每个可能的
k
组n
元素(可能包括重复)的分组,然后过滤到没有任何重叠的那些。 let's directly build groupings that meet the criterion.让我们直接建立符合标准的分组。 This also avoids generating redundant groupings in different orders (the original code could also do this by using
combinations
rather than product
in the last step).这也避免了以不同顺序生成冗余分组(原始代码也可以通过使用
combinations
而不是最后一步中的product
来做到这一点)。
The approach is:方法是:
n
elements in the input) for the first set - by which I mean, the one that contains the first of the elements that will be chosen.n
元素的组合) - 我的意思是,包含将选择的第一个元素的那个。 In order to combine the results elegantly, we use a recursive generator : rather than trying to build lists that contain results from the recursive calls, we just yield
everything we need to.为了优雅地组合结果,我们使用递归生成器:与其尝试构建包含递归调用结果的列表,我们只需
yield
我们需要的所有内容。 We represent each collection of group_count
many elements with a tuple of tuples (the inner tuples are the groups).我们用一个元组的元组表示每个
group_count
元素的集合(内部元组是组)。 At the base case, there is exactly one way to make no groups of elements - by just... doing that... yeah... - so we need to yield
one value which is a tuple of no tuples of an irrelevant number of elements each - ie, an empty tuple.在基本情况下,只有一种方法可以不生成任何元素组 - 只需......这样做......是的...... - 所以我们需要
yield
一个值,它是一个没有不相关数字的元组的元组每个元素 - 即一个空元组。 In the other cases, we prepend the tuple for the current group to each result from the recursive call, yield
ing all those results.在其他情况下,我们将当前组的元组添加到递归调用的每个结果中,从而
yield
所有这些结果。
from itertools import combinations
def non_overlapping_groups(group_count, group_size, population):
if group_count == 0:
yield ()
return
for indices in combinations(range(len(population)), group_size):
current = (tuple(population[i] for i in indices),)
remaining = [
x for i, x in enumerate(population)
if i not in indices and i > indices[0]
] if indices else population
for recursive in non_overlapping_groups(group_count - 1, group_size, remaining):
yield current + recursive
Let's try it:让我们尝试一下:
>>> list(non_overlapping_groups(2, 3, 'abcdef'))
[(('a', 'b', 'c'), ('d', 'e', 'f')), (('a', 'b', 'd'), ('c', 'e', 'f')), (('a', 'b', 'e'), ('c', 'd', 'f')), (('a', 'b', 'f'), ('c', 'd', 'e')), (('a', 'c', 'd'), ('b', 'e', 'f')), (('a', 'c', 'e'), ('b', 'd', 'f')), (('a', 'c', 'f'), ('b', 'd', 'e')), (('a', 'd', 'e'), ('b', 'c', 'f')), (('a', 'd', 'f'), ('b', 'c', 'e')), (('a', 'e', 'f'), ('b', 'c', 'd'))]
>>> list(non_overlapping_groups(3, 2, 'abcdef'))
[(('a', 'b'), ('c', 'd'), ('e', 'f')), (('a', 'b'), ('c', 'e'), ('d', 'f')), (('a', 'b'), ('c', 'f'), ('d', 'e')), (('a', 'c'), ('b', 'd'), ('e', 'f')), (('a', 'c'), ('b', 'e'), ('d', 'f')), (('a', 'c'), ('b', 'f'), ('d', 'e')), (('a', 'd'), ('b', 'c'), ('e', 'f')), (('a', 'd'), ('b', 'e'), ('c', 'f')), (('a', 'd'), ('b', 'f'), ('c', 'e')), (('a', 'e'), ('b', 'c'), ('d', 'f')), (('a', 'e'), ('b', 'd'), ('c', 'f')), (('a', 'e'), ('b', 'f'), ('c', 'd')), (('a', 'f'), ('b', 'c'), ('d', 'e')), (('a', 'f'), ('b', 'd'), ('c', 'e')), (('a', 'f'), ('b', 'e'), ('c', 'd'))]
>>> # Some quick sanity checks
>>> len(list(non_overlapping_groups(2, 3, 'abcdef')))
10
>>> # With fewer input elements, obviously we can't do it.
>>> len(list(non_overlapping_groups(2, 3, 'abcde')))
0
>>> # Adding a 7th element, any element could be the odd one out,
>>> # and in each case we get another 10 possibilities, making 10 * 7 = 70.
>>> len(list(non_overlapping_groups(2, 3, 'abcdefg')))
70
I performance tested this against a modified version of the original (which also shows how to make it work properly with non-strings, and optimizes the sum
calculation):我针对原始版本的修改版本对此进行了性能测试(还展示了如何使其与非字符串一起正常工作,并优化
sum
计算):
def unique_group(group_count, group_size, population):
groups = list(it.combinations(population, group_size))
return (
i for i in combinations(groups, group_count)
if len({e for g in i for e in g}) == group_count * group_size
)
Quickly verifying the equivalence:快速验证等价性:
>>> len(list(unique_group(3, 2, 'abcdef')))
15
>>> len(list(non_overlapping_groups(3, 2, 'abcdef')))
15
>>> set(unique_group(3, 2, 'abcdef')) == set(non_overlapping_groups(3, 2, 'abcdef'))
True
We see that even for fairly small examples (here, the output has 280 groupings), the brute-force approach has to filter through a lot :我们看到,即使对于相当小的示例(这里,output 有 280 个分组),蛮力方法也必须过滤很多:
>>> import timeit
>>> timeit.timeit("list(g(3, 3, 'abcdefghi'))", globals={'g': unique_group}, number=100)
5.895461600041017
>>> timeit.timeit("list(g(3, 3, 'abcdefghi'))", globals={'g': non_overlapping_groups}, number=100)
0.2303082060534507
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.