I have a Pandas dataframe with sport results per tournament as follows (simplified):
Tournament WinnerName LoserName
t1 A X
t1 B Y
t1 C Y
t2 A X
t2 B Y
t2 C Y
In a dictionary I have information about the players' ranks per tournament:
Tournament Player Rank
t1 A 1
t1 B 7
t1 C 70
t2 A 11
t2 B 1
t2 C 100
Now I want to know how often per tournament the winner of a match is ranked in one of these categories: a) between 1 and 10, b) between 11 and 49, c) greater than 49.
So the result could either look like this:
Tournament WinnerName LoserName Group
t1 A X a
t1 B Y a
t1 C Y c
t2 A X b
t2 B Y a
t2 C Y c
or like this:
Tournament WinnerName LoserName GroupA GroupB GroupC
t1 A X 1 0 0
t1 B Y 1 0 0
t1 C Y 0 0 1
t2 A X 0 1 0
t2 B Y 1 0 0
t2 C Y 0 0 1
After that I can easily count the occurrences per column. But currently I am stuck in achieving one of the two given results. I know it should work somehow with apply
or transform
, but I have no precise idea unfortunately. Maybe there is even a better solutions to achieve this?
Thank you.
From the Rank (column) you can cut and get_dummies:
In [11]: r
Out[11]:
0 1
1 7
2 70
3 11
4 1
5 100
Name: Rank, dtype: int64
In [12]: pd.cut(r, [0, 10, 49, 100], include_lowest=True)
Out[12]:
0 [0, 10]
1 [0, 10]
2 (49, 100]
3 (10, 49]
4 [0, 10]
5 (49, 100]
Name: Rank, dtype: category
Categories (3, object): [[0, 10] < (10, 49] < (49, 100]]
In [13]: pd.get_dummies(pd.cut(r, [0, 10, 49, 100], include_lowest=True))
Out[13]:
[0, 10] (10, 49] (49, 100]
0 1 0 0
1 1 0 0
2 0 0 1
3 0 1 0
4 1 0 0
5 0 0 1
Now you can join/whatever these with your original DataFrames.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.