简体   繁体   English

如何计算 pandas dataframe 列中值的比率?

[英]How to calculate ratio of values in a pandas dataframe column?

I'm new to pandas and decided to learn it by playing around with some data I pulled from my favorite game's API. I have a dataframe with two columns "playerId" and "winner" like so:我是 pandas 的新手,决定通过玩弄我从我最喜欢的游戏 API 中提取的一些数据来学习它。我有一个 dataframe,其中包含两列“playerId”和“winner”,如下所示:

playerStatus:
______________________
   playerId   winner
0    1848      True
1    1988      False
2    3543      True
3    1848      False
4    1988      False
...

Each row represents a match the player participated in. My goal is to either transform this dataframe or create a new one such that the win percentage for each playerId is calculated.每行代表玩家参加的一场比赛。我的目标是转换此 dataframe 或创建一个新的,以便计算每个 playerId 的获胜百分比。 For example, the above dataframe would become:例如,上面的 dataframe 将变为:

playerWinsAndTotals
_________________________________________
   playerId   wins  totalPlayed   winPct
0    1848      1        2         50.0000
1    1988      0        2         0.0000
2    3543      1        1         100.0000
...

It took quite a while of reading pandas docs, but I actually managed to achieve this by essentially creating two different tables (one to find the number of wins for each player, one to find the total games for each player), and merging them, then taking the ratio of wins to games played.阅读 pandas 文档花了很长时间,但我实际上通过创建两个不同的表(一个用于查找每个玩家的获胜次数,一个用于查找每个玩家的总游戏数)并合并它们来实现这一点,然后计算获胜次数与所玩游戏的比率。

Creating the "wins" dataframe:创建“胜利”dataframe:

temp_df = playerStatus[['playerId', 'winner']].value_counts().reset_index(name='wins')
onlyWins = temp_df[temp_df['winner'] == True][['playerId', 'wins']]
onlyWins
_________________________
    playerId    wins
1     1670       483
3     1748       474
4     2179       468
6     4006       434
8     1668       392
...

Creating the "totals" dataframe:创建“总计”dataframe:

totalPlayed = playerStatus['playerId'].value_counts().reset_index(name='totalCount').rename(columns={'index': 'playerId'})
totalPlayed
____________________

   playerId   totalCount
0    1670        961
1    1748        919
2    1872        877
3    4006        839
4    2179        837
...

Finally, merging them and adding the "winPct" column.最后,合并它们并添加“winPct”列。

playerWinsAndTotals = onlyWins.merge(totalPlayed, on='playerId', how='left')
playerWinsAndTotals['winPct'] = playerWinsAndTotals['wins']/playerWinsAndTotals['totalCount'] * 100
playerWinsAndTotals
_____________________________________________

   playerId   wins   totalCount     winPct
0    1670      483      961       50.260146
1    1748      474      919       51.577802
2    2179      468      837       55.913978
3    4006      434      839       51.728248
4    1668      392      712       55.056180
...

Now, the reason I am posting this here is because I know I'm not taking full advantage of what pandas has to offer.现在,我在这里发布这个的原因是因为我知道我没有充分利用 pandas 提供的功能。 Creating and merging two different dataframes just to find the ratio of player wins seems unnecessary.创建和合并两个不同的数据框只是为了找到玩家获胜的比率似乎是不必要的。 I feel like I took the "scenic" route on this one.我觉得我在这一条上走的是“风景”路线。

To anyone more experienced than me, how would you tackle this problem?对于比我更有经验的人,您将如何解决这个问题?

We can take advantage of the way that Boolean values are handled mathematically ( True being 1 and False being 0 ) and use 3 aggregation functions sum , count and mean per group ( groupby aggregate ).我们可以利用 Boolean 值的数学处理方式( True1False0 ),并使用 3 个聚合函数sumcount和每组meangroupby aggregate )。 We can also take advantage of Named Aggregation to both create and rename the columns in one step:我们还可以利用命名聚合一步创建和重命名列:

df = (
    df.groupby('playerId', as_index=False)
        .agg(wins=('winner', 'sum'),
             totalCount=('winner', 'count'),
             winPct=('winner', 'mean'))
)
# Scale up winPct
df['winPct'] *= 100

df : df

   playerId  wins  totalCount  winPct
0      1848     1           2    50.0
1      1988     0           2     0.0
2      3543     1           1   100.0

DataFrame and imports: DataFrame 及进口:

import pandas as pd

df = pd.DataFrame({
    'playerId': [1848, 1988, 3543, 1848, 1988],
    'winner': [True, False, True, False, False]
})

You can try something like this你可以尝试这样的事情

import pandas as pd
df = pd.read_csv('data.csv')

# If for any reason winner column is a string and not a boolean try
# import numpy as np
# df['winner'] = np.where(df['winner'] == 'True', 1, 0)

df = df.groupby('playerId')['winner'].agg(['count', 'sum'])
df['percentage'] = 100 * df['sum'] / df['count']
df = df.rename(columns={'count': 'total', 'sum': 'wins'})
print(df)

prints印刷

          total  wins  percentage
playerId                   
1848          2     1        50.0
1988          2     0         0.0
3543          1     1       100.0

Data I used我使用的数据

playerId,winner
1848,True
1988,False
3543,True
1848,False
1988,False

In your case just do mean can yield the pct在你的情况下只是mean可以产生 pct

out = df.groupby('playerId')['winner'].agg(['sum','count','mean'])
Out[22]: 
          sum  count  mean
playerId                  
1848        1      2   0.5
1988        0      2   0.0
3543        1      1   1.0

Try:尝试:

import pandas as pd
import numpy as np

df = pd.DataFrame({'playerId': {0: 1848, 1: 1988, 2: 3543, 3: 1848, 4: 1988},
 'winner': {0: True, 1: False, 2: True, 3: False, 4: False}})

s = df.groupby('playerId')['winner'].apply(lambda x: (np.sum(x)/len(x)*100))

df = (df.groupby('playerId')
.agg({'playerId':'count', 'winner': 'sum'})
.rename(columns={'winner':'wins','playerId':'totalPlayed'})

.reset_index()
)

df['winPct'] = df['playerId'].map(s)

df = df[['playerId', 'wins', 'totalPlayed', 'winPct']]

print(df)

   playerId  wins  totalPlayed  winPct
0      1848     1            2    50.0
1      1988     0            2     0.0
2      3543     1            1   100.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在熊猫数据框中的某些条件下计算比率 - how to calculate ratio on some condition in pandas dataframe 如何从两个不同的熊猫数据框中计算比率 - How to calculate ratio from two different pandas dataframe 如何随机地将“是/否”(比例为7:3)附加到pandas数据帧中的列? - How to randomly append “Yes/No” (ratio of 7:3) to a column in pandas dataframe? 如何从 pandas DataFrame 中的值创建比率分数? - How to create a ratio score from values in a pandas DataFrame? 如何计算 Pandas dataframe 中的新“标准化”列? - How to calculate new “normalized” column in a Pandas dataframe? 如何计算列 pandas dataframe 中列表的平均值 - How to calculate a mean of a list in a column pandas dataframe 如何计算 Pandas 数据框的统计值? - How to calculate statistical values on Pandas dataframe? 如何使用 Pandas MultiIndex DataFrame 中的先前值进行计算? - How to calculate with previous values in a Pandas MultiIndex DataFrame? 计算 pandas dataframe 中每一列的第一个值的增长率并返回 Numpy 数组 - Calculate the growth ratio from first value in every column in a pandas dataframe and return a Numpy array 每个组的熊猫计算两个类别的比率,并使用.pipe()作为新列追加到数据框 - pandas for each group calculate ratio of two categories, and append as a new column to dataframe using .pipe()
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM