[英]Select One of each in a pandas grouping
I'm trying to create all of the possible combinations of pairings of players to assign into 4 person golf teams based on handicap type A, B, C, or D. 我正在尝试根据障碍类型A,B,C或D创建所有可能的球员配对组合,以分配给4人高尔夫团队。
I've tried various itertools methods such as combinations and permutations but can't figure out the right approach. 我尝试了各种itertools方法,例如组合和排列,但找不到正确的方法。
from itertools import combinations, product, permutations
g = player_df.groupby(by = 'hcp_ABCD')
teams_listoflists = [group[1].index for group in g]
teams_combo_ndx = [player for player in permutations(teams_listoflists, 4)]
Here is my pandas table: 这是我的熊猫表:
handicap name hcp_ABCD
0 24 Player1 D
1 21 Player2 D
2 8 Player3 B
3 14 Player4 C
4 20 Player5 D
5 13 Player6 C
6 -1 Player7 A
7 5 Player8 A
8 8 Player9 B
9 6 Player10 B
10 20 Player11 D
11 15 Player12 C
12 0 Player13 A
13 12 Player14 C
14 0 Player15 A
15 10 Player16 B
i would like the output to be all combinations (without duplicates) of player combinations (teams) such that each team has a type A, B, C, and D on each. 我希望输出是球员组合(团队)的所有组合(没有重复项),这样每个团队在每个上都有A,B,C和D类型。 This output can be a similar table as above grouped by "options." 此输出可以是与上述类似的表,按“选项”分组。
Edit: Am adding this output example for clarity. 编辑:为了清楚起见,正在添加此输出示例。
A Player B Player C Player D Player
option 1 team1 Player7 Player3 Player4 Player1
team2 Player8 Player9 Player6 Player2
team3 Player13 Player10 Player12 Player5
team4 Player15 Player16 Player14 Player11
option 2 team1 Player7 Player16 Player4 Player1
team2 Player8 Player3 Player6 Player2
team3 Player13 Player9 Player12 Player5
team4 Player15 Player10 Player14 Player11
...
A Player B Player C Player D Player
option n team1 Player7 Player3 Player4 Player11
team2 Player8 Player9 Player6 Player1
team3 Player13 Player10 Player12 Player2
team4 Player15 Player16 Player14 Player5
The point of the above is that I'm trying to find a generator that cycles through all combinations of player in each handicap group so that the combination of options of teams is clear. 上面的要点是,我试图找到一个生成器,该生成器循环遍历每个让分组中所有玩家的组合,以使团队选项的组合变得清晰。
Edit #2 I've determined that this code produces a combination of all of the potential team combinations: 编辑#2,我确定此代码会产生所有潜在团队组合的组合:
g = df.groupby(by = 'hcp_ABCD')
combinations = [list(group[1].index) for group in g]
This creates a list of lists with the A Players in list[0], B Players in list[1], etc. 这将创建一个列表列表,其中A播放器在列表[0]中,B播放器在列表[1]中,等等。
And this gets an indexer for all possible combinations of teams: 这将为团队的所有可能组合提供索引器:
from itertools import product
options = [option for option in product(*combinations)]
But, how to assign these out into the "options" (see above example) and ensure no duplication is what I'm stuck on. 但是,如何将这些分配给“选项”(请参见上面的示例),并确保没有重复是我所坚持的。
Edit #3 A simpler version (way to think about this problems) is to use the following sets: 编辑#3一个更简单的版本(考虑此问题的方式)是使用以下集合:
A = ['A1', 'A2', 'A3', 'A4']
B = ['B1', 'B2', 'B3', 'B4']
C = ['C1', 'C2', 'C3', 'C4']
D= ['D1', 'D2', 'D3', 'D4']
This essentially does what the groupby does above (grouping by hcp_ABCD) but names each "A Player", "B Player", etc. 这基本上可以完成groupby的工作(按hcp_ABCD进行分组),但是分别命名为“ A Player”,“ B Player”等。
possible_combinations of teams: 团队的可能组合:
team_combinations = [team for team in product(A, B, C, D)]
then the next trick is to assign these onto combinations of 4 teams with no duplication of players. 那么下一个技巧是将这些分配到4个团队的组合中,而不会重复玩家。
Thanks for clarifying about the expected result. 感谢您澄清预期的结果。 Here is the my answer which I tested. 这是我测试过的答案。 It may not be the exact format of your expected result but I leave it to you to fix it. 它可能不是您预期结果的确切格式,但请您自行解决。
import pandas as pd
def is_duplicate_team(team, group):
'''check if an option already exists'''
return any(group == t for t in team)
def is_player_exists(group, arr):
'''check if a player exists in a group'''
return any(x in g for g in group for x in arr)
df = [ (24 ,'Player1','D'),
(21 ,'Player2','D'),
(8 ,'Player3','B'),
(14 ,'Player4','C'),
(20 ,'Player5','D'),
(13 ,'Player6','C'),
(-1 ,'Player7','A'),
(5 ,'Player8','A'),
(8 ,'Player9','B'),
(6 ,'Player10','B'),
(20 ,'Player11','D'),
(15 ,'Player12','C'),
(0 ,'Player13','A'),
(12 ,'Player14','C'),
(0 ,'Player15','A'),
(10 ,'Player16','B')]
df = pd.DataFrame(df, columns=['handicap', 'name', 'hcp_ABCD'])
from itertools import product
grouped = df.groupby('hcp_ABCD')['name'].apply(list).reset_index()
df_name = [n for n in grouped.name]
df_comb = [p for p in product(*df_name)]
# below code will get all combinations of groups and for a team having all players
teams=[]
for i in df_comb[:-1]:
group=[i]
for j in df_comb[1:]:
if not is_player_exists(group, j):
group.append(j)
if len(group) == 4:
if not is_duplicate_team(teams, group):
teams.append(group)
continue
# below code will print the output similar to what you expected
i=0
for t in teams:
i+=1
print('option: ', str(i) )
for p in t:
print(p)
I made a suggestion in the comments. 我在评论中提出了一个建议。 Here is an implementation: 这是一个实现:
import pandas as pd
from functools import reduce
data = [
(24,'Player1','D'),
(21,'Player2','D'),
(8,'Player3','B'),
(8,'Player4','B'),
(14,'Player5','C'),
(13,'Player6','C'),
(-1,'Player7','A'),
(5,'Player8','A')
]
df = pd.DataFrame(
data,
columns=['handicap', 'name', 'hcp_ABCD']
)
dfs = [
grp_df.drop(columns="hcp_ABCD")
.rename(columns={"name": f"player_{hndcp}",
"handicap": f"handicap_{hndcp}"})
for hndcp, grp_df in df.assign(key=1)
.groupby("hcp_ABCD")
]
result = reduce(
lambda left, right: left.merge(right, how="outer", on="key"),
dfs
).drop(columns="key")
print(result)
Output: 输出:
handicap_A player_A handicap_B player_B handicap_C player_C handicap_D player_D
0 -1 Player7 8 Player3 14 Player5 24 Player1
1 -1 Player7 8 Player3 14 Player5 21 Player2
2 -1 Player7 8 Player3 13 Player6 24 Player1
3 -1 Player7 8 Player3 13 Player6 21 Player2
4 -1 Player7 8 Player4 14 Player5 24 Player1
5 -1 Player7 8 Player4 14 Player5 21 Player2
6 -1 Player7 8 Player4 13 Player6 24 Player1
7 -1 Player7 8 Player4 13 Player6 21 Player2
8 5 Player8 8 Player3 14 Player5 24 Player1
9 5 Player8 8 Player3 14 Player5 21 Player2
10 5 Player8 8 Player3 13 Player6 24 Player1
11 5 Player8 8 Player3 13 Player6 21 Player2
12 5 Player8 8 Player4 14 Player5 24 Player1
13 5 Player8 8 Player4 14 Player5 21 Player2
14 5 Player8 8 Player4 13 Player6 24 Player1
15 5 Player8 8 Player4 13 Player6 21 Player2
The following approach is uses a cartesian product and then groups twice to distribute the players into teams with a set of unique handicaps. 以下方法是使用笛卡尔积,然后分组两次以将玩家分配到具有一组独特障碍的团队中。
import pandas as pd
from pandas.compat import StringIO
print(pd.__version__)
pd.options.display.max_rows = 664
csvdata = StringIO("""handicap,name,hcp_ABCD
24,Player1,D
21,Player2,D
8,Player3,B
14,Player4,C
20,Player5,D
13,Player6,C
-1,Player7,A
5,Player8,A
8,Player9,B
6,Player10,B
20,Player11,D
15,Player12,C
0,Player13,A
12,Player14,C
0,Player15,A
10,Player16,B""")
df=pd.read_csv(csvdata)
# Generate all possible groups
# https://stackoverflow.com/questions/53699012/performant-cartesian-product-cross-join-with-pandas
def cartesian_product(left, right):
return (left.assign(key=1).merge(right.assign(key=1), on='key').drop('key', 1))
def distribute_players(x):
x['distribute'] = range(0, 4)
return x
df = cartesian_product(df, df.copy())
df = df.groupby(['name_x', 'hcp_ABCD_y']).apply(distribute_players)
df['team'] = df.groupby(['name_x', 'distribute']).ngroup()
print(df[['handicap_y','name_y','hcp_ABCD_y','team']].sort_values(['team']))
handicap_y name_y hcp_ABCD_y team
0 24 Player1 D 0
2 8 Player3 B 0
3 14 Player4 C 0
6 -1 Player7 A 0
1 21 Player2 D 1
5 13 Player6 C 1
7 5 Player8 A 1
8 8 Player9 B 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.