简体   繁体   中英

Number of occurences in a DataFrame grouped by other column values

I've tried using value_counts() with groupby() but so far haven't succeeded. This is the DataFrame:

 user | game_result  | other_columns
--------------------------------------
john  |   win        |     ...
john  |   lose       |     ...
kim   |   lose       |     ...
alice |   draw       |     ...
...       ...              ...

How could I get a result like this? (Counting occurences of each result separately for each user)

      |     win   |   lose   | draw
--------------------------------------
john  |   32      |    30    |   3
kim   |   52      |    50    |   2
alice |   24      |    12    |   0
 ...      ...          ...      ...

(Or it could be transposed, I don't mind)
Also, what would be the efficient way to convert that to a DataFrame of percentages?

You can use pandas.pivot_table(...) :

df["_dummy"]=1
df.pivot_table(index="user", columns="game_result", values="_dummy", aggfunc="sum").fillna(0).astype("int")

For the test data:

#df
    user game_result
0   john         win
1    kim        draw
2  alice        draw
3   john       loose
4    kim         win
5   john       loose

#pivot_table
game_result  draw  loose  win
user
alice           1      0    0
john            0      2    1
kim             1      0    1

Ref: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.pivot_table.html

I'd use pandas and sql:

import pandasql as ps
import pandas as pd

mydata = pd.DataFrame([
["john","lose"]
,["kim","win"]
,["john","win"]
,["john","win"]
,["kim","draw"]
,["kim","win"]]
,columns=["user","game_result"])

mysql = """
with counts as (
select
user
,case when game_result='win' then 1 else 0 end as 'win'
,case when game_result='lose' then 1 else 0 end as 'lose'
,case when game_result='draw' then 1 else 0 end as 'draw'
from mydata
)
select user
,sum(win)
,sum(lose)
,sum(draw)
from counts
group by user
"""

print(ps.sqldf(mysql))

Starting with a dummy dataset

df = pd.DataFrame({'user': ['john', 'john', 'alice', 'alice'], 'game_result':['win', 'win', 'lose','win']})
    user game_result
0   john         win
1   john         win
2  alice        lose
3  alice         win

The first step would be to count the number of wins and losses for each player.

counts = df.groupby(['user', 'game_result']).size().reset_index(name='count')

This gives:

    user game_result  count
0  alice        lose      1
1  alice         win      1
2   john         win      2

We'll then pivot the data to have users as rows, game_result as columns and counts as the values

result = counts.pivot('user', 'game_result', 'count').reset_index()
result = result.fillna(0)

Which gives:

game_result   user  lose  win
0            alice   1.0  1.0
1             john   0.0  2.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM