简体   繁体   中英

How do I split into train and test set after creating dummy variables?

I have already created dummy variables for all my categorical columns, but I need to split my data into train and test set, with my target being "Loan_Status". I am confused because after creating dummy variables, this creates two new columns for "Loan_Status", so when or how would I split my data and create the target?

# Convert the categorical features into dummy variables.

df_dummies = pd.get_dummies(df1, columns=['Gender', 'Married', 'Dependents', 'Education', 'Self_Employed', 'Loan_Status'])
df_dummies.head()

turns into this创建虚拟变量后的贷款状态,现在为两列

It looked like this before, so how would i create the target to be loan status, wouldnt splitting the data before dummys create issues? 在此处输入图像描述

As a rule of thumb, you should stick to pd.get_dummies(drop_first=True, ...) to avoid creating redundant columns, as N-1 columns contain full information about N possible values.

However, one hot encoding is a bit excessive for binary values, you're probably better off just using something like .map() .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM