[英]How to use multiple conditions based on 2 columns to create the new column in pandas?
[英]Pandas - How to create a column with 3 outputs based on conditions on multiple columns
我有一個DataFrame df:
def fake_data():
return{'Name': fake.name(),
'Gender': random.choice(sex_list),
'Address': fake.street_address(),
'Nationality': 'Zimbabwean',
'Account_Type': random.choice(accounts_list),
'Age': random.randint(0, 2),
'Education': random.random() > 0.5,
'Employment': random.randint(0, 2),
'Salary': random.randint(0, 2),
'Employer_Stability': random.random() > 0.5,
'Consistency': random.random() > 0.5,
'Balance': random.randint(0, 2),
'Residential_Status': random.random() > 0.5
}
我想根據列的條件創建一個0或1或2的Service_Level
列;
columns = ['Age','Education', 'Employment', 'Salary', 'Employer_Stability', 'Consistency', 'Balance', 'Residential_Status']
在閱讀了這里的一些答案之后,我嘗試使用以下代碼創建['Service_Level']
= 0;
df['Service_Level'] = np.where((df['Age']==0)&(df['Education']==False)&(df['Employment']==0)&(df['Salary']==0)&(df['Employer_Stability']==False)&(df['Consistency']==False)&(df['Balance']==0)&(df['Residential_Status']==False),
(df['Age'])|(df['Education'])|(df['Employment'])|(df['Salary'])|(df['Employer_Stability'])|(df['Consistency'])|(df['Balance'])|(df['Residential_Status']), 0)
然后,這對於['Service_Level']
= 1
df['Service_Level'] = np.where((df['Age']==1)&(df['Education']==True)&(df['Employment']==1)&(df['Salary']==1)&(df['Employer_Stability']==False)&(df['Consistency']==True)&(df['Balance']==1)&(df['Residential_Status']==True),
(df['Age'])|(df['Education'])|(df['Employment'])|(df['Salary'])|(df['Employer_Stability'])|(df['Consistency'])|(df['Balance'])|(df['Residential_Status']), 1)
然后對於['Service_Level']
= 2
df['Service_Level'] = np.where((df['Age']==2)&(df['Education']==True)&(df['Employment']==2)&(df['Salary']==2)&(df['Employer_Stability']==True)&(df['Consistency']==True)&(df['Balance']==2)&(df['Residential_Status']==True),
(df['Age'])|(df['Education'])|(df['Employment'])|(df['Salary'])|(df['Employer_Stability'])|(df['Consistency'])|(df['Balance'])|(df['Residential_Status']), 2)
不幸的是,我不知道如何加入這些條件,所以我得到0或1或2。
如果可行,不遵循這些確切條件的狀態會發生什么? 然后我也想生產和輸出
您可能需要將切片與np.where結合使用(順便說一下,這需要三個參數,條件,val1(如果條件為true),val2)
你的第一句話
df['Service_Level'] = np.where(condtion_1, 0, 1)
對於符合第一個條件的行,這將導致df ['Service_Level']的值為0,否則為1。
現在,屏蔽數據以僅獲取其中service_level不為0的行
df[df['Service_Level'] !=0]
在此數據框上,您可以將第二個條件應用於
np.where(condition_2, 1,2)
將1分配給條件為true的df ['Service_Level']並將2分配給其余行。
編輯:
您可以在第一個條件中將np.where與第二個條件一起使用,如下所示。
df['Service_Level'] = np.where(cond_1, 0, (np.where(cond_2, 1,2)))
為了提高可讀性,您可能需要先將條件另存為cond_1等,然后在np.where中使用它們。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.