[英]How do i create a 'count' column based on condition with using python pandas?
I have used groupby like below but it didn't work:我使用过如下所示的 groupby,但没有用:
df={ 'id' :[1,1, 2,2, 3], 'testname' : ['math', 'science', 'math', 'literature', 'math'], 'result' :['passed', 'failed', 'passed', 'passed', 'failed'}
ndf=df.groupby(['id', 'testname']) ['result']. count()
Example dataframe:示例数据框:
Id testname. result
1. math. passed
1. science. failed
2. math. passed
2. literature. passed
3. math. failed
Based on condition: count+=1 if the id is pass all exam that he take else count =0.基于条件:count+=1,如果id通过了他参加的所有考试,否则count=0。
Therefore, the output should be like:因此,输出应该是这样的:
Expected output: Get a total value - >Total pass student will be 1.预期输出:得到一个总值 - >总通过学生将是 1。
It seems you are looking to count the number of students who don't have any failed tests (passed everything).您似乎正在计算没有任何未通过测试(通过所有测试)的学生人数。 You are on the right track with grouping...but I'm not sure why you are grouping by id
and testname
.你在分组的正确轨道上......但我不确定你为什么testname
id
和testname
分组。
Sometimes in these types of problems where you are looking for "the ones that don't have any negative results" you can more easily count the ones with any negative result and subtract that from the original dataset size.有时在这些类型的问题中,您正在寻找“没有任何负面结果的问题”,您可以更轻松地计算具有任何负面结果的问题,并从原始数据集大小中减去。 Here is an approach:这是一种方法:
Note: You could certainly chain some of this stuff together, I just broke it apart for clarity.注意:你当然可以将这些东西链接在一起,为了清楚起见,我只是把它分开了。
In [25]: df
Out[25]:
id testname result
0 1 math passed
1 1 science failed
2 2 math passed
3 2 literature passed
4 3 math failed
In [26]: failed_df = df[df['result']=='failed']
In [27]: ids_with_failures = len(failed_df)
In [28]: tot_ids = len(df.groupby('id'))
In [29]: count = tot_ids - ids_with_failures
In [30]: count
Out[30]: 1
df = pd.DataFrame({'id': [1, 1, 2, 2, 3],
'testname': ['math', 'science', 'math', 'literature', 'math'],
'result': ['passed', 'failed', 'passed', 'passed', 'failed']})
This function below returns True if all entries in a series are equal to 'passed'.如果系列中的所有条目都等于“通过”,则下面的此函数将返回 True。 In other words if the student has not 'passed' at least once, it returns False.换句话说,如果学生至少没有“通过”一次,则返回 False。
def verify_all_exams_are_passed(results):
return results.eq('passed').all()
Finally, apply the function to each student (id), to see which students have passed all exams.最后,将该函数应用于每个学生 (id),以查看哪些学生通过了所有考试。
>>> students_passed = df.groupby('id')['result'].apply(verify_all_exams_are_passed)
>>> students_passed
id
1 False
2 True
3 False
Name: result, dtype: bool
You can also sum this series directly to get the count of students who passed all exams你也可以直接对这个系列求和,得到通过所有考试的学生人数
>>> total_passed.sum()
1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.