How do I split into train and test set after creating dummy variables?

Question

I have already created dummy variables for all my categorical columns, but I need to split my data into train and test set, with my target being "Loan_Status". I am confused because after creating dummy variables, this creates two new columns for "Loan_Status", so when or how would I split my data and create the target?

# Convert the categorical features into dummy variables.

df_dummies = pd.get_dummies(df1, columns=['Gender', 'Married', 'Dependents', 'Education', 'Self_Employed', 'Loan_Status'])
df_dummies.head()

turns into this

It looked like this before, so how would i create the target to be loan status, wouldnt splitting the data before dummys create issues?

Answer 1

As a rule of thumb, you should stick to pd.get_dummies(drop_first=True, ...) to avoid creating redundant columns, as N-1 columns contain full information about N possible values.

However, one hot encoding is a bit excessive for binary values, you're probably better off just using something like .map() .

How do I split into train and test set after creating dummy variables?

Question

1 answers

solution1
0 2022-08-13 22:26:10

How do I split into train and test set after creating dummy variables?

Question

1 answers

solution1 0 2022-08-13 22:26:10

solution1
0 2022-08-13 22:26:10