I've tried using value_counts()
with groupby()
but so far haven't succeeded. This is the DataFrame:
user | game_result | other_columns
--------------------------------------
john | win | ...
john | lose | ...
kim | lose | ...
alice | draw | ...
... ... ...
How could I get a result like this? (Counting occurences of each result separately for each user)
| win | lose | draw
--------------------------------------
john | 32 | 30 | 3
kim | 52 | 50 | 2
alice | 24 | 12 | 0
... ... ... ...
(Or it could be transposed, I don't mind)
Also, what would be the efficient way to convert that to a DataFrame of percentages?
You can use pandas.pivot_table(...)
:
df["_dummy"]=1
df.pivot_table(index="user", columns="game_result", values="_dummy", aggfunc="sum").fillna(0).astype("int")
For the test data:
#df
user game_result
0 john win
1 kim draw
2 alice draw
3 john loose
4 kim win
5 john loose
#pivot_table
game_result draw loose win
user
alice 1 0 0
john 0 2 1
kim 1 0 1
Ref: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.pivot_table.html
I'd use pandas and sql:
import pandasql as ps
import pandas as pd
mydata = pd.DataFrame([
["john","lose"]
,["kim","win"]
,["john","win"]
,["john","win"]
,["kim","draw"]
,["kim","win"]]
,columns=["user","game_result"])
mysql = """
with counts as (
select
user
,case when game_result='win' then 1 else 0 end as 'win'
,case when game_result='lose' then 1 else 0 end as 'lose'
,case when game_result='draw' then 1 else 0 end as 'draw'
from mydata
)
select user
,sum(win)
,sum(lose)
,sum(draw)
from counts
group by user
"""
print(ps.sqldf(mysql))
Starting with a dummy dataset
df = pd.DataFrame({'user': ['john', 'john', 'alice', 'alice'], 'game_result':['win', 'win', 'lose','win']})
user game_result
0 john win
1 john win
2 alice lose
3 alice win
The first step would be to count the number of wins and losses for each player.
counts = df.groupby(['user', 'game_result']).size().reset_index(name='count')
This gives:
user game_result count
0 alice lose 1
1 alice win 1
2 john win 2
We'll then pivot the data to have users
as rows, game_result
as columns and counts
as the values
result = counts.pivot('user', 'game_result', 'count').reset_index()
result = result.fillna(0)
Which gives:
game_result user lose win
0 alice 1.0 1.0
1 john 0.0 2.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.