简体   繁体   English

在熊猫分组中选择一个

[英]Select One of each in a pandas grouping

I'm trying to create all of the possible combinations of pairings of players to assign into 4 person golf teams based on handicap type A, B, C, or D. 我正在尝试根据障碍类型A,B,C或D创建所有可能的球员配对组合,以分配给4人高尔夫团队。

I've tried various itertools methods such as combinations and permutations but can't figure out the right approach. 我尝试了各种itertools方法,例如组合和排列,但找不到正确的方法。

from itertools import combinations, product, permutations
g = player_df.groupby(by = 'hcp_ABCD')
teams_listoflists = [group[1].index for group in g]
teams_combo_ndx = [player for player in permutations(teams_listoflists, 4)]

Here is my pandas table: 这是我的熊猫表:

        handicap      name hcp_ABCD
0         24   Player1        D
1         21   Player2        D
2          8   Player3        B
3         14   Player4        C
4         20   Player5        D
5         13   Player6        C
6         -1   Player7        A
7          5   Player8        A
8          8   Player9        B
9          6  Player10        B
10        20  Player11        D
11        15  Player12        C
12         0  Player13        A
13        12  Player14        C
14         0  Player15        A
15        10  Player16        B

i would like the output to be all combinations (without duplicates) of player combinations (teams) such that each team has a type A, B, C, and D on each. 我希望输出是球员组合(团队)的所有组合(没有重复项),这样每个团队在每个上都有A,B,C和D类型。 This output can be a similar table as above grouped by "options." 此输出可以是与上述类似的表,按“选项”分组。

Edit: Am adding this output example for clarity. 编辑:为了清楚起见,正在添加此输出示例。

                       A Player     B Player     C Player   D Player
    option 1  team1    Player7      Player3      Player4    Player1
              team2    Player8      Player9      Player6    Player2
              team3    Player13     Player10     Player12   Player5
              team4    Player15     Player16     Player14   Player11

    option 2  team1    Player7      Player16     Player4    Player1
              team2    Player8      Player3      Player6    Player2
              team3    Player13     Player9      Player12   Player5
              team4    Player15     Player10     Player14   Player11

    ...


                       A Player     B Player     C Player   D Player
    option n  team1    Player7      Player3      Player4    Player11
              team2    Player8      Player9      Player6    Player1
              team3    Player13     Player10     Player12   Player2
              team4    Player15     Player16     Player14   Player5

The point of the above is that I'm trying to find a generator that cycles through all combinations of player in each handicap group so that the combination of options of teams is clear. 上面的要点是,我试图找到一个生成器,该生成器循环遍历每个让分组中所有玩家的组合,以使团队选项的组合变得清晰。

Edit #2 I've determined that this code produces a combination of all of the potential team combinations: 编辑#2,我确定此代码会产生所有潜在团队组合的组合:

g = df.groupby(by = 'hcp_ABCD')
combinations = [list(group[1].index) for group in g]

This creates a list of lists with the A Players in list[0], B Players in list[1], etc. 这将创建一个列表列表,其中A播放器在列表[0]中,B播放器在列表[1]中,等等。

And this gets an indexer for all possible combinations of teams: 这将为团队的所有可能组合提供索引器:

from itertools import product
options = [option for option in product(*combinations)]

But, how to assign these out into the "options" (see above example) and ensure no duplication is what I'm stuck on. 但是,如何将这些分配给“选项”(请参见上面的示例),并确保没有重复是我所坚持的。

Edit #3 A simpler version (way to think about this problems) is to use the following sets: 编辑#3一个更简单的版本(考虑此问题的方式)是使用以下集合:

A = ['A1', 'A2', 'A3', 'A4']
B = ['B1', 'B2', 'B3', 'B4']
C = ['C1', 'C2', 'C3', 'C4']
D=  ['D1', 'D2', 'D3', 'D4']

This essentially does what the groupby does above (grouping by hcp_ABCD) but names each "A Player", "B Player", etc. 这基本上可以完成groupby的工作(按hcp_ABCD进行分组),但是分别命名为“ A Player”,“ B Player”等。

possible_combinations of teams: 团队的可能组合:

team_combinations = [team for team in product(A, B, C, D)]

then the next trick is to assign these onto combinations of 4 teams with no duplication of players. 那么下一个技巧是将这些分配到4个团队的组合中,而不会重复玩家。

Thanks for clarifying about the expected result. 感谢您澄清预期的结果。 Here is the my answer which I tested. 这是我测试过的答案。 It may not be the exact format of your expected result but I leave it to you to fix it. 它可能不是您预期结果的确切格式,但请您自行解决。

import pandas as pd
def is_duplicate_team(team, group):
    '''check if an option already exists'''
    return any(group == t for t in team)
def is_player_exists(group, arr):
    '''check if a player exists in a group'''
    return any(x in g for g in group for x in arr)

df = [         (24   ,'Player1','D'),
         (21   ,'Player2','D'),
          (8   ,'Player3','B'),
         (14   ,'Player4','C'),
         (20   ,'Player5','D'),
         (13   ,'Player6','C'),
         (-1   ,'Player7','A'),
          (5   ,'Player8','A'),
          (8   ,'Player9','B'),
          (6  ,'Player10','B'),
        (20  ,'Player11','D'),
        (15  ,'Player12','C'),
         (0  ,'Player13','A'),
        (12  ,'Player14','C'),
         (0  ,'Player15','A'),
        (10  ,'Player16','B')]
df = pd.DataFrame(df, columns=['handicap', 'name', 'hcp_ABCD'])
from itertools import product
grouped = df.groupby('hcp_ABCD')['name'].apply(list).reset_index()
df_name = [n for n in grouped.name]
df_comb = [p for p in product(*df_name)]

# below code will get all combinations of groups and for a team having all players
teams=[]
for i in df_comb[:-1]:
    group=[i] 
    for j in df_comb[1:]: 
        if not is_player_exists(group, j):
            group.append(j)
        if len(group) == 4:
            if not is_duplicate_team(teams, group):
                teams.append(group)
            continue

# below code will print the output similar to what you expected
i=0
for t in teams:
    i+=1
    print('option: ', str(i) )
    for p in t:
        print(p)

I made a suggestion in the comments. 我在评论中提出了一个建议。 Here is an implementation: 这是一个实现:

import pandas as pd
from functools import reduce

data = [
    (24,'Player1','D'),
    (21,'Player2','D'),
    (8,'Player3','B'),
    (8,'Player4','B'),
    (14,'Player5','C'),
    (13,'Player6','C'),
    (-1,'Player7','A'),
    (5,'Player8','A')
]
df = pd.DataFrame(
    data,
    columns=['handicap', 'name', 'hcp_ABCD']
)

dfs = [
    grp_df.drop(columns="hcp_ABCD")
          .rename(columns={"name": f"player_{hndcp}",
                           "handicap": f"handicap_{hndcp}"})
    for hndcp, grp_df in df.assign(key=1)
                           .groupby("hcp_ABCD")
]
result = reduce(
    lambda left, right: left.merge(right, how="outer", on="key"),
    dfs
).drop(columns="key")
print(result)

Output: 输出:

    handicap_A player_A  handicap_B player_B  handicap_C player_C  handicap_D player_D
0           -1  Player7           8  Player3          14  Player5          24  Player1
1           -1  Player7           8  Player3          14  Player5          21  Player2
2           -1  Player7           8  Player3          13  Player6          24  Player1
3           -1  Player7           8  Player3          13  Player6          21  Player2
4           -1  Player7           8  Player4          14  Player5          24  Player1
5           -1  Player7           8  Player4          14  Player5          21  Player2
6           -1  Player7           8  Player4          13  Player6          24  Player1
7           -1  Player7           8  Player4          13  Player6          21  Player2
8            5  Player8           8  Player3          14  Player5          24  Player1
9            5  Player8           8  Player3          14  Player5          21  Player2
10           5  Player8           8  Player3          13  Player6          24  Player1
11           5  Player8           8  Player3          13  Player6          21  Player2
12           5  Player8           8  Player4          14  Player5          24  Player1
13           5  Player8           8  Player4          14  Player5          21  Player2
14           5  Player8           8  Player4          13  Player6          24  Player1
15           5  Player8           8  Player4          13  Player6          21  Player2

The following approach is uses a cartesian product and then groups twice to distribute the players into teams with a set of unique handicaps. 以下方法是使用笛卡尔积,然后分组两次以将玩家分配到具有一组独特障碍的团队中。

import pandas as pd
from pandas.compat import StringIO

print(pd.__version__)
pd.options.display.max_rows = 664

csvdata = StringIO("""handicap,name,hcp_ABCD
24,Player1,D
21,Player2,D
8,Player3,B
14,Player4,C
20,Player5,D
13,Player6,C
-1,Player7,A
5,Player8,A
8,Player9,B
6,Player10,B
20,Player11,D
15,Player12,C
0,Player13,A
12,Player14,C
0,Player15,A
10,Player16,B""")

df=pd.read_csv(csvdata)

# Generate all possible groups
# https://stackoverflow.com/questions/53699012/performant-cartesian-product-cross-join-with-pandas
def cartesian_product(left, right):
    return (left.assign(key=1).merge(right.assign(key=1), on='key').drop('key', 1))

def distribute_players(x):
    x['distribute'] = range(0, 4)
    return x

df = cartesian_product(df, df.copy())
df = df.groupby(['name_x', 'hcp_ABCD_y']).apply(distribute_players)
df['team'] = df.groupby(['name_x', 'distribute']).ngroup()
print(df[['handicap_y','name_y','hcp_ABCD_y','team']].sort_values(['team']))

     handicap_y    name_y hcp_ABCD_y  team
0            24   Player1          D     0
2             8   Player3          B     0
3            14   Player4          C     0
6            -1   Player7          A     0
1            21   Player2          D     1
5            13   Player6          C     1
7             5   Player8          A     1
8             8   Player9          B     1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM