pandas数据帧过滤器计算

Question

I have the following dataframe 我有以下数据帧

    student_id  gender  major   admitted
0   35377   female  Chemistry   False
1   56105   male    Physics True
2   31441   female  Chemistry   False
3   51765   male    Physics True
4   53714   female  Physics True
5   50693   female  Chemistry   False
6   25946   male    Physics True
7   27648   female  Chemistry   True
8   55247   male    Physics False
9   35838   male    Physics True

How would I calculate the admission rate for female physics majors? 我如何计算女性物理专业的录取率？

Answer 1

import numpy as np
np.average(dat['admitted'][(dat['gender']=='female') & (dat['major']=='Physics')].values)

Working Principle: (dat['gender']=='female') & (dat['major']=='Physics') creates a boolean pandas Series which can be used to select the correct entries from the dat['admitted'] Series. 工作原理： (dat['gender']=='female') & (dat['major']=='Physics')创建一个布尔pandas系列，可用于从dat['admitted']选择正确的条目dat['admitted']系列。 The .values functionality extracts those entries into a numpy array. .values功能将这些条目提取为numpy数组。 At the end we take the average of those entries giving us the admittance ratio. 最后，我们采用这些条目的平均值给出了我们的准入率。

Answer 2

I think - 我认为 -

df_f = df[(df['gender']=='female') & (df['major']=='Physics')]
df_f['admitted'].mean()

First part filters female and Physics . 第一部分过滤female和Physics 。 Next, we calculate mean . 接下来，我们计算mean 。

The mean part sounds unintuitive and weird but mathematically it will give the percentage value. mean部分听起来不直观且很奇怪，但在数学上它会给出百分比值。 Python treats boolean values as 0 and 1 so basically if you are summing up and dividing by the count (which mean does) you are actually calculating the percentage of female students with a major in Physics who were admitted 蟒蛇把boolean值0和1所以基本上，如果你正在总结和计分（这mean做），你实际上是计算的百分比female学生中的一大Physics谁被admitted

Answer 3

import numpy as np
import pandas as pd
df = pd.DataFrame({"gender":np.random.choice(["male","female"],[20]),
                   "admitted":np.random.choice([True,False],[20]),
                   "major":np.random.choice(["Chemistry","Physics"],[20])})

phy_female_admited = df.loc[(df["major"]=="Physics") & (df["admitted"]==True) & ((df["gender"]=="female"))]
phy_female_applied = df.loc[(df["major"]=="Physics") & ((df["gender"]=="female"))]

acceptance_rate = phy_female_admited.shape[0]/phy_female_applied.shape[0]

A little more expanded answer but basically works in the same way as DZurico's 更广泛的答案，但基本上与DZurico的工作方式相同

ignore the line where i am creating a dataframe and use your own data instead 忽略我在创建数据框的行，而是使用您自己的数据

Answer 4

Solution for all admission rates with groupby and GroupBy.size , and GroupBy.transform with sum : 使用groupby和GroupBy.size以及GroupBy.transform和sum所有录取率的解决方案：

a = df.groupby(['gender' ,'admitted', 'major']).size()
print (a)
gender  admitted  major    
female  False     Chemistry    3
        True      Chemistry    1
                  Physics      1
male    False     Physics      1
        True      Physics      4
dtype: int64

b = a.groupby(['gender' ,'major']).transform('sum')
print (b)
gender  admitted  major    
female  False     Chemistry    4
        True      Chemistry    4
                  Physics      1
male    False     Physics      5
        True      Physics      5
dtype: int64

c = a.div(b)
print (c)
gender  admitted  major    
female  False     Chemistry    0.75
        True      Chemistry    0.25
                  Physics      1.00
male    False     Physics      0.20
        True      Physics      0.80
dtype: float64

Select by tuples which row of c need: 通过元组选择哪一行c需要：

print (c.loc[('female',True,'Physics')])
1.0

If want all values in DataFrame : 如果想要DataFrame所有值：

d = a.div(b).reset_index(name='rates')
print (d)
   gender  admitted      major  rates
0  female     False  Chemistry   0.75
1  female      True  Chemistry   0.25
2  female      True    Physics   1.00
3    male     False    Physics   0.20
4    male      True    Physics   0.80

pandas数据帧过滤器计算

问题描述

4 个解决方案

解决方案1
1 2018-04-02 05:23:39

解决方案2
1 2018-04-02 05:53:47

解决方案3
0 2018-04-02 05:44:21

解决方案4
0 2018-04-02 05:47:30

pandas数据帧过滤器计算

问题描述

4 个解决方案

解决方案1 1 2018-04-02 05:23:39

解决方案2 1 2018-04-02 05:53:47

解决方案3 0 2018-04-02 05:44:21

解决方案4 0 2018-04-02 05:47:30

解决方案1
1 2018-04-02 05:23:39

解决方案2
1 2018-04-02 05:53:47

解决方案3
0 2018-04-02 05:44:21

解决方案4
0 2018-04-02 05:47:30