[英]Pandas - How to create a column with 3 outputs based on conditions on multiple columns
I have a DataFrame df:
我有一个DataFrame
df:
def fake_data():
return{'Name': fake.name(),
'Gender': random.choice(sex_list),
'Address': fake.street_address(),
'Nationality': 'Zimbabwean',
'Account_Type': random.choice(accounts_list),
'Age': random.randint(0, 2),
'Education': random.random() > 0.5,
'Employment': random.randint(0, 2),
'Salary': random.randint(0, 2),
'Employer_Stability': random.random() > 0.5,
'Consistency': random.random() > 0.5,
'Balance': random.randint(0, 2),
'Residential_Status': random.random() > 0.5
}
I want to create a column Service_Level
that is 0 or 1 or 2 depending on the conditions of the columns; 我想根据列的条件创建一个0或1或2的
Service_Level
列;
columns = ['Age','Education', 'Employment', 'Salary', 'Employer_Stability', 'Consistency', 'Balance', 'Residential_Status']
I have tried creating the ['Service_Level']
= 0 with the following, after reading some answers here; 在阅读了这里的一些答案之后,我尝试使用以下代码创建
['Service_Level']
= 0;
df['Service_Level'] = np.where((df['Age']==0)&(df['Education']==False)&(df['Employment']==0)&(df['Salary']==0)&(df['Employer_Stability']==False)&(df['Consistency']==False)&(df['Balance']==0)&(df['Residential_Status']==False),
(df['Age'])|(df['Education'])|(df['Employment'])|(df['Salary'])|(df['Employer_Stability'])|(df['Consistency'])|(df['Balance'])|(df['Residential_Status']), 0)
Then this for ['Service_Level']
= 1 然后,这对于
['Service_Level']
= 1
df['Service_Level'] = np.where((df['Age']==1)&(df['Education']==True)&(df['Employment']==1)&(df['Salary']==1)&(df['Employer_Stability']==False)&(df['Consistency']==True)&(df['Balance']==1)&(df['Residential_Status']==True),
(df['Age'])|(df['Education'])|(df['Employment'])|(df['Salary'])|(df['Employer_Stability'])|(df['Consistency'])|(df['Balance'])|(df['Residential_Status']), 1)
Then this for ['Service_Level']
= 2 然后对于
['Service_Level']
= 2
df['Service_Level'] = np.where((df['Age']==2)&(df['Education']==True)&(df['Employment']==2)&(df['Salary']==2)&(df['Employer_Stability']==True)&(df['Consistency']==True)&(df['Balance']==2)&(df['Residential_Status']==True),
(df['Age'])|(df['Education'])|(df['Employment'])|(df['Salary'])|(df['Employer_Stability'])|(df['Consistency'])|(df['Balance'])|(df['Residential_Status']), 2)
Unfortunately, I can't figure out how to join these conditions so that I get either 0 or 1 or 2. 不幸的是,我不知道如何加入这些条件,所以我得到0或1或2。
If it works, what happens to the states that do not follow those exact conditions? 如果可行,不遵循这些确切条件的状态会发生什么? I would like then to also produce and output
然后我也想生产和输出
You might need to use slicing in conjunction with np.where (which by the way takes three argument, condition, val1(if condion is true), val2) 您可能需要将切片与np.where结合使用(顺便说一下,这需要三个参数,条件,val1(如果条件为true),val2)
Your first statement 你的第一句话
df['Service_Level'] = np.where(condtion_1, 0, 1)
This will result in df['Service_Level'] with 0s for the rows that met with the first condition and 1 otherwise. 对于符合第一个条件的行,这将导致df ['Service_Level']的值为0,否则为1。
Now you mask the data to get only the rows where service_level is not 0 现在,屏蔽数据以仅获取其中service_level不为0的行
df[df['Service_Level'] !=0]
On this dataframe you can apply the second condition with 在此数据框上,您可以将第二个条件应用于
np.where(condition_2, 1,2)
to assign 1 to df['Service_Level'] where the condition is true and assign 2 to rest of the rows. 将1分配给条件为true的df ['Service_Level']并将2分配给其余行。
EDIT: 编辑:
You can use np.where with second condtion inside the first one like this. 您可以在第一个条件中将np.where与第二个条件一起使用,如下所示。
df['Service_Level'] = np.where(cond_1, 0, (np.where(cond_2, 1,2)))
For better readability, you may want to first save the conditions as cond_1 etc and use them in np.where 为了提高可读性,您可能需要先将条件另存为cond_1等,然后在np.where中使用它们。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.