简体   繁体   English

计算满足DataFrame中多个条件的值的百分比

[英]Calculate the percentage of values that meet multiple conditions in DataFrame

I have a DataFrame with information from every single March Madness game since 1985. Now I am trying to calculate the percentage of wins by the higher seed by round. 我有一个DataFrame,其中包含自1985年以来每一个March Madness游戏的信息。现在我试图通过一轮来计算更高种子的胜利百分比。 The main DataFrame looks like this: 主DataFrame看起来像这样:

在此输入图像描述

I thought that the best way to do it is by creating separate functions. 我认为最好的方法是创建单独的功能。 The first one deals with when the score is higher than the score.1 return team and when score.1 is higher than score return team.1 Then append those at end of function. 第一个是处理得分高于得分的情况.1回归队伍和得分1高于得分回归队伍.1然后在功能结束时追加。 Next one for needs u do seed.1 higher than seed and return team then seed higher than seed.1 and return team.1 then append and last function make a function for when those are equal 下一个需要你做种子.1高于种子和返回团队然后种子高于seed.1并返回team.1然后追加和最后一个函数为那些相等的时候做一个函数

def func1(x):
    if tourney.loc[tourney['Score']] > tourney.loc[tourney['Score.1']]:
        return tourney.loc[tourney['Team']]
    elif tourney.loc[tourney['Score.1']] > tourney.loc[tourney['Score']]:
        return tourney.loc[tourney['Team.1']]

func1(tourney.loc[tourney['Score']])

You can apply a row-wise function by apply a lambda function to the entire dataframe, with the axis=1 . 您可以通过将lambda函数应用于整个数据帧来应用行方式函数,其中axis=1 This will allow you to get a True/False column 'low_seed_wins' . 这将允许您获得True/False'low_seed_wins'

With the new column of True/False you can take the count and the sum (count being the number of games, and sum being the number of lower_seed victories). 使用新的True / False列,您可以获取计数和总和(计数是游戏数量,总和是lower_seed胜利的数量)。 Using this you can divide the sum by the count to get the win ratio. 使用此功能,您可以将总和除以计数以获得胜率。

This only works because your lower seed teams are always on the left. 这只能起作用,因为你的低级种子队总是在左边。 If they are not it will be a little more complex. 如果它们不是,它会更复杂一些。

import pandas as pd
df = pd.DataFrame([[1987,3,1,74,68,5],[1987,3,2,87,81,6],[1987,4,1,84,81,2],[1987,4,1,75,79,2]], columns=['Year','Round','Seed','Score','Score.1','Seed.1'])

df['low_seed_wins'] = df.apply(lambda row: row['Score'] > row['Score.1'], axis=1)

df = df.groupby(['Year','Round'])['low_seed_wins'].agg(['count','sum']).reset_index()

df['ratio'] = df['sum'] / df['count']

df.head()


Year    Round   count   sum     ratio
0   1987    3   2       2.0     1.0
1   1987    4   2       1.0     0.5

You should be to calculate this by checking both conditions, for both the first and second team. 您应该通过检查第一和第二组的两个条件来计算。 This returns a boolean, the sum of which is the number of cases it is true. 这将返回一个布尔值,其总和是它为真的个案数。 Then just divide by the length of the whole dataframe to get the percentage. 然后只需除以整个数据帧的长度即可获得百分比。 Without test data hard to check exactly 没有测试数据很难准确检查

(
    ((tourney['Seed'] > tourney['Seed.1']) & 
     (tourney['Score'] > tourney['Score.1'])) || 
    ((tourney['Seed.1'] > tourney['Seed']) & 
     (tourney['Score.1'] > tourney['Score']))
).sum() / len(tourney)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM