Python Pandas: Reduce DataFrame to Unique Combinations

Question

I have a data set that lists some basketball player names along with their positions. With that data I have created a DataFrame that lists all possible lineup combinations. That all works just fine. My issue is: Since some players are eligible at multiple positions, that DataFrame includes records that have the same set of players, but listed at different positions. Here's a small example from the dataframe:

PG	SG	SF	PF	C	G	F	UTIL
Luka Doncic	Tim Hardaway Jr.	Dillon Brooks	Keldon Johnson	Xavier Tillman Sr.	Tyus Jones	DeMar DeRozan	Bradley Beal
Tyus Jones	Dillon Brooks	Tim Hardaway Jr.	DeMar DeRozan	Xavier Tillman Sr.	Luka Doncic	Keldon Johnson	Bradley Beal
Tyus Jones	Bradley Beal	Keldon Johnson	DeMar DeRozan	Xavier Tillman Sr.	Tim Hardaway Jr.	Brandon Clarke	Luka Doncic
Tyus Jones	Tim Hardaway Jr.	Keldon Johnson	DeMar DeRozan	Brandon Clarke	Bradley Beal	Xavier Tillman Sr.	Luka Doncic
Luka Doncic	Tim Hardaway Jr.	Kyle Anderson	Keldon Johnson	Jonas Valanciunas	Tyus Jones	Xavier Tillman Sr.	Bradley Beal
Luka Doncic	Bradley Beal	Keldon Johnson	Kyle Anderson	Jonas Valanciunas	Tyus Jones	Xavier Tillman Sr.	Tim Hardaway Jr.

As you can see, the same players are in record 1 and 2, but listed at different positions. Likewise, the same players are in 3 and 4. And same in 5 and 6. Note: This a simplified example; There are way more lineups with the same players. I need each unique set of players, regardless of position to be represented by one record. It doesn't matter if it shows the first or last record with that combination of players. So how do I reduce the dataframe above to something like the dataframe below? I'll also need to reset the index once the dataframe is reduced.

PG	SG	SF	PF	C	G	F	UTIL
Luka Doncic	Tim Hardaway Jr.	Dillon Brooks	Keldon Johnson	Xavier Tillman Sr.	Tyus Jones	DeMar DeRozan	Bradley Beal
Tyus Jones	Bradley Beal	Keldon Johnson	DeMar DeRozan	Xavier Tillman Sr.	Tim Hardaway Jr.	Brandon Clarke	Luka Doncic
Luka Doncic	Tim Hardaway Jr.	Kyle Anderson	Keldon Johnson	Jonas Valanciunas	Tyus Jones	Xavier Tillman Sr.	Bradley Beal

Thank you very much in advance!

Answer 1

You can groupby a set representation of each row and then pick off the first/last:

In [16]: df = pd.DataFrame({'a': [1, 2, 3, 4, 5], 'b': [2, 1, 2, 5, 4], 'c': [3, 3, 1, 6, 6]})

In [17]: df
Out[17]:
   a  b  c
0  1  2  3
1  2  1  3
2  3  2  1
3  4  5  6
4  5  4  6

In [18]: df.groupby(df.apply(lambda x: tuple(set(x)), axis=1)).first()
Out[18]:
           a  b  c
(1, 2, 3)  1  2  3
(4, 5, 6)  4  5  6

In [19]: df.groupby(df.apply(lambda x: tuple(set(x)), axis=1)).last()
Out[19]:
           a  b  c
(1, 2, 3)  3  2  1
(4, 5, 6)  5  4  6

You can also clear that index with .reset_index(drop=True) at the end.

Python Pandas: Reduce DataFrame to Unique Combinations

Question

1 answers

solution1
0 ACCPTED 2021-03-11 14:50:03

Python Pandas: Reduce DataFrame to Unique Combinations

Question

1 answers

solution1 0 ACCPTED 2021-03-11 14:50:03

solution1
0 ACCPTED 2021-03-11 14:50:03