简体   繁体   English

如何使用 python pandas 根据条件创建“计数”列?

[英]How do i create a 'count' column based on condition with using python pandas?

I have used groupby like below but it didn't work:我使用过如下所示的 groupby,但没有用:

df={ 'id' :[1,1, 2,2, 3], 'testname' : ['math', 'science', 'math', 'literature', 'math'], 'result' :['passed', 'failed', 'passed', 'passed', 'failed'}


    ndf=df.groupby(['id', 'testname']) ['result']. count() 

Example dataframe:示例数据框:

Id  testname.    result
1.     math.           passed
1.     science.      failed 
2.     math.           passed
2.     literature.    passed
3.     math.           failed

Based on condition: count+=1 if the id is pass all exam that he take else count =0.基于条件:count+=1,如果id通过了他参加的所有考试,否则count=0。

Therefore, the output should be like:因此,输出应该是这样的:

Expected output: Get a total value - >Total pass student will be 1.预期输出:得到一个总值 - >总通过学生将是 1。

It seems you are looking to count the number of students who don't have any failed tests (passed everything).您似乎正在计算没有任何未通过测试(通过所有测试)的学生人数。 You are on the right track with grouping...but I'm not sure why you are grouping by id and testname .你在分组的正确轨道上......但我不确定你为什么testname idtestname分组。

Sometimes in these types of problems where you are looking for "the ones that don't have any negative results" you can more easily count the ones with any negative result and subtract that from the original dataset size.有时在这些类型的问题中,您正在寻找“没有任何负面结果的问题”,您可以更轻松地计算具有任何负面结果的问题,并从原始数据集大小中减去。 Here is an approach:这是一种方法:

  1. find any row with a failure using filtering使用过滤查找任何失败的行
  2. group that result by id so now we have a basis to count everybody with a failure按 id 对结果进行分组,所以现在我们有一个基础来计算每个失败的人
  3. subtract that count from the original number of unique id's从唯一 id 的原始数量中减去该计数

Note: You could certainly chain some of this stuff together, I just broke it apart for clarity.注意:你当然可以将这些东西链接在一起,为了清楚起见,我只是把它分开了。

In [25]: df                                                                     
Out[25]: 
   id    testname  result
0   1        math  passed
1   1     science  failed
2   2        math  passed
3   2  literature  passed
4   3        math  failed

In [26]: failed_df = df[df['result']=='failed']                                 

In [27]: ids_with_failures = len(failed_df)                                     

In [28]: tot_ids = len(df.groupby('id'))                                        

In [29]: count = tot_ids - ids_with_failures                                    

In [30]: count                                                                  
Out[30]: 1
df = pd.DataFrame({'id': [1, 1, 2, 2, 3],
                   'testname': ['math', 'science', 'math', 'literature', 'math'],
                   'result': ['passed', 'failed', 'passed', 'passed', 'failed']})

This function below returns True if all entries in a series are equal to 'passed'.如果系列中的所有条目都等于“通过”,则下面的此函数将返回 True。 In other words if the student has not 'passed' at least once, it returns False.换句话说,如果学生至少没有“通过”一次,则返回 False。

def verify_all_exams_are_passed(results):
    return results.eq('passed').all()

Finally, apply the function to each student (id), to see which students have passed all exams.最后,将该函数应用于每个学生 (id),以查看哪些学生通过了所有考试。

>>> students_passed = df.groupby('id')['result'].apply(verify_all_exams_are_passed)

>>> students_passed
id
1    False
2     True
3    False
Name: result, dtype: bool

You can also sum this series directly to get the count of students who passed all exams你也可以直接对这个系列求和,得到通过所有考试的学生人数

>>> total_passed.sum()
1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM