如何计算 pandas dataframe 列中值的比率？

Question

I'm new to pandas and decided to learn it by playing around with some data I pulled from my favorite game's API. I have a dataframe with two columns "playerId" and "winner" like so:我是 pandas 的新手，决定通过玩弄我从我最喜欢的游戏 API 中提取的一些数据来学习它。我有一个 dataframe，其中包含两列“playerId”和“winner”，如下所示：

playerStatus:
______________________
   playerId   winner
0    1848      True
1    1988      False
2    3543      True
3    1848      False
4    1988      False
...

Each row represents a match the player participated in. My goal is to either transform this dataframe or create a new one such that the win percentage for each playerId is calculated.每行代表玩家参加的一场比赛。我的目标是转换此 dataframe 或创建一个新的，以便计算每个 playerId 的获胜百分比。 For example, the above dataframe would become:例如，上面的 dataframe 将变为：

playerWinsAndTotals
_________________________________________
   playerId   wins  totalPlayed   winPct
0    1848      1        2         50.0000
1    1988      0        2         0.0000
2    3543      1        1         100.0000
...

It took quite a while of reading pandas docs, but I actually managed to achieve this by essentially creating two different tables (one to find the number of wins for each player, one to find the total games for each player), and merging them, then taking the ratio of wins to games played.阅读 pandas 文档花了很长时间，但我实际上通过创建两个不同的表（一个用于查找每个玩家的获胜次数，一个用于查找每个玩家的总游戏数）并合并它们来实现这一点，然后计算获胜次数与所玩游戏的比率。

Creating the "wins" dataframe:创建“胜利”dataframe：

temp_df = playerStatus[['playerId', 'winner']].value_counts().reset_index(name='wins')
onlyWins = temp_df[temp_df['winner'] == True][['playerId', 'wins']]
onlyWins
_________________________
    playerId    wins
1     1670       483
3     1748       474
4     2179       468
6     4006       434
8     1668       392
...

Creating the "totals" dataframe:创建“总计”dataframe：

totalPlayed = playerStatus['playerId'].value_counts().reset_index(name='totalCount').rename(columns={'index': 'playerId'})
totalPlayed
____________________

   playerId   totalCount
0    1670        961
1    1748        919
2    1872        877
3    4006        839
4    2179        837
...

Finally, merging them and adding the "winPct" column.最后，合并它们并添加“winPct”列。

playerWinsAndTotals = onlyWins.merge(totalPlayed, on='playerId', how='left')
playerWinsAndTotals['winPct'] = playerWinsAndTotals['wins']/playerWinsAndTotals['totalCount'] * 100
playerWinsAndTotals
_____________________________________________

   playerId   wins   totalCount     winPct
0    1670      483      961       50.260146
1    1748      474      919       51.577802
2    2179      468      837       55.913978
3    4006      434      839       51.728248
4    1668      392      712       55.056180
...

Now, the reason I am posting this here is because I know I'm not taking full advantage of what pandas has to offer.现在，我在这里发布这个的原因是因为我知道我没有充分利用 pandas 提供的功能。 Creating and merging two different dataframes just to find the ratio of player wins seems unnecessary.创建和合并两个不同的数据框只是为了找到玩家获胜的比率似乎是不必要的。 I feel like I took the "scenic" route on this one.我觉得我在这一条上走的是“风景”路线。

To anyone more experienced than me, how would you tackle this problem?对于比我更有经验的人，您将如何解决这个问题？

Answer 1

We can take advantage of the way that Boolean values are handled mathematically ( True being 1 and False being 0 ) and use 3 aggregation functions sum , count and mean per group ( groupby aggregate ).我们可以利用 Boolean 值的数学处理方式（ True为1 ， False为0 ），并使用 3 个聚合函数sum 、 count和每组mean （ groupby aggregate ）。 We can also take advantage of Named Aggregation to both create and rename the columns in one step:我们还可以利用命名聚合一步创建和重命名列：

df = (
    df.groupby('playerId', as_index=False)
        .agg(wins=('winner', 'sum'),
             totalCount=('winner', 'count'),
             winPct=('winner', 'mean'))
)
# Scale up winPct
df['winPct'] *= 100

df : df ：

   playerId  wins  totalCount  winPct
0      1848     1           2    50.0
1      1988     0           2     0.0
2      3543     1           1   100.0

DataFrame and imports: DataFrame 及进口：

import pandas as pd

df = pd.DataFrame({
    'playerId': [1848, 1988, 3543, 1848, 1988],
    'winner': [True, False, True, False, False]
})

Answer 2

You can try something like this你可以尝试这样的事情

import pandas as pd
df = pd.read_csv('data.csv')

# If for any reason winner column is a string and not a boolean try
# import numpy as np
# df['winner'] = np.where(df['winner'] == 'True', 1, 0)

df = df.groupby('playerId')['winner'].agg(['count', 'sum'])
df['percentage'] = 100 * df['sum'] / df['count']
df = df.rename(columns={'count': 'total', 'sum': 'wins'})
print(df)

prints印刷

          total  wins  percentage
playerId                   
1848          2     1        50.0
1988          2     0         0.0
3543          1     1       100.0

Data I used我使用的数据

playerId,winner
1848,True
1988,False
3543,True
1848,False
1988,False

Answer 3

In your case just do mean can yield the pct在你的情况下只是mean可以产生 pct

out = df.groupby('playerId')['winner'].agg(['sum','count','mean'])
Out[22]: 
          sum  count  mean
playerId                  
1848        1      2   0.5
1988        0      2   0.0
3543        1      1   1.0

Answer 4

Try:尝试：

import pandas as pd
import numpy as np

df = pd.DataFrame({'playerId': {0: 1848, 1: 1988, 2: 3543, 3: 1848, 4: 1988},
 'winner': {0: True, 1: False, 2: True, 3: False, 4: False}})

s = df.groupby('playerId')['winner'].apply(lambda x: (np.sum(x)/len(x)*100))

df = (df.groupby('playerId')
.agg({'playerId':'count', 'winner': 'sum'})
.rename(columns={'winner':'wins','playerId':'totalPlayed'})

.reset_index()
)

df['winPct'] = df['playerId'].map(s)

df = df[['playerId', 'wins', 'totalPlayed', 'winPct']]

print(df)

   playerId  wins  totalPlayed  winPct
0      1848     1            2    50.0
1      1988     0            2     0.0
2      3543     1            1   100.0

如何计算 pandas dataframe 列中值的比率？

问题描述

4 个解决方案

解决方案1
6 已采纳 2021-08-16 01:59:56

解决方案2
3 2021-08-16 01:37:11

解决方案3
3 2021-08-16 02:11:42

解决方案4
0 2021-08-16 02:51:13

如何计算 pandas dataframe 列中值的比率？

问题描述

4 个解决方案

解决方案1 6 已采纳 2021-08-16 01:59:56

解决方案2 3 2021-08-16 01:37:11

解决方案3 3 2021-08-16 02:11:42

解决方案4 0 2021-08-16 02:51:13

解决方案1
6 已采纳 2021-08-16 01:59:56

解决方案2
3 2021-08-16 01:37:11

解决方案3
3 2021-08-16 02:11:42

解决方案4
0 2021-08-16 02:51:13