[英]How to count unique rows in a column based on multiple conditions in python
I have a data frame that looks like that : (the treatment has multiple possibily of character variable, I just simplified for the question) 我有一个看起来像这样的数据框:(处理可能有多个字符变量,我只是简化了这个问题)
ID Position Treatment
--20AxECvv- 0 A
--20AxECvv- -1 A
--20AxECvv- -2 A
--h9INKewQf- 0 A
--h9INKewQf- -1 B
zZU7a@8jN 0 B
QUeSNEXmdB 0 C
QUeSNEXmdB -1 C
qu72Ql@h79 0 C
I just want to keep the ID with exclusif treatment, in other word keep ID who was treated by only one treatment even if it was several times. 我只想保留排他性治疗的ID,换句话说,即使经过多次治疗,也要保留仅接受一种治疗的ID。 After, I want to sum the number of ID for each treatment.
之后,我想对每种治疗的ID数求和。 The result would be :
结果将是:
ID Position Treatment
--20AxECvv- 0 A
--20AxECvv- -1 A
--20AxECvv- -2 A
zZU7a@8jN 0 B
QUeSNEXmdB 0 C
QUeSNEXmdB -1 C
qu72Ql@h79 0 C
And the sum : 和:
A : 1
B : 1
C : 2
I have any ida how to resolve this, maybe with a loop within a loop but I am a beginner with Python/panda Thanks 我有任何一个ida如何解决这个问题,也许是在一个循环中一个循环,但是我是Python / panda的初学者。谢谢
You can groupby ID and filter the rows based on the condition number of unique rows == 1 您可以按ID分组并根据唯一行的条件数量== 1过滤行
df1 = df.loc[df.groupby('ID').Treatment.filter(lambda x: x.nunique()==1).index]
Or as @Igor Raush suggested, 或如@Igor Raush所建议的,
df1 = df.groupby('ID').filter(lambda g: g.Treatment.nunique() == 1)
ID Position Treatment
0 --20AxECvv- 0 A
1 --20AxECvv- -1 A
2 --20AxECvv- -2 A
5 zZU7a@8jN 0 B
6 QUeSNEXmdB 0 C
7 QUeSNEXmdB -1 C
8 qu72Ql@h79 0 C
And to get the unique count 并获得唯一计数
df1.groupby('Treatment').ID.nunique()
Treatment
A 1
B 1
C 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.