[英]How to create a new column in Pandas DataFrame based on a combination 1 and many columns
I have a data set that looks like this: 我有一个数据集,看起来像这样:
Cond Column_A Column_B Column_C Cumulative_Count
0 1 -0.60 -0.12 -0.17 1
1 0 0.30 0.70 0.98 0
2 1 -0.45 -0.71 -0.99 2
3 1 0.60 0.12 0.17 1
4 0 0.20 0.80 0.60 0
5 1 0.70 0.14 0.20 1
I would like to create a column Cumulative_Count that counts occurrence of an event based on multiple conditions such as: 我想创建一个Cumulative_Count列,该列基于多个条件对事件的发生进行计数,例如:
1) If Cond=1 and (Column_A<0.5 or Column B>0.5) then Cumulative_Count=Cumulative_Count+1 1)如果Cond = 1并且(Column_A <0.5或Column B> 0.5),则Cumulative_Count = Cumulative_Count + 1
2) If Cond=1 and (Column_B<0.5 or Column B>0.5) then Cumulative_Count=Cumulative_Count+1 2)如果Cond = 1并且(Column_B <0.5或Column B> 0.5),则Cumulative_Count = Cumulative_Count + 1
3) If Cond=1 and (Column_C<0.5 or Column C>0.5) then Cumulative_Count=Cumulative_Count+1 3)如果Cond = 1并且(Column_C <0.5或Column C> 0.5),则Cumulative_Count = Cumulative_Count + 1
I would like to use NumPy arrays to perform it because my dataset is very large. 我想使用NumPy数组来执行它,因为我的数据集非常大。 I tried using below code, it is not throwing error, but the result is not correct. 我尝试使用下面的代码,它不会引发错误,但是结果不正确。 And, I need to use it for all columns if possible because I have 50+ columns. 而且,如果可能的话,我需要对所有列都使用它,因为我有50多个列。
df['Cum_Count']=0
df['Cum_Count']=np.where((df['Cond']>0 & ((df['Column_A']<-0.5) | (df['Column_A']>0.5))), df['Cum_Count']+1, df['Cum_Count'])
Doing with 一起做
cond1=df.filter(like='Column')
cond2=df.Cond
df['count']=(cond1.gt(0.5)|cond1.lt(-0.5)).__and__(cond2,axis=0).sum(1)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.