How to one hot encode with multiple labels in Python?

Question

I have a table in CSV that looks something like this:

id   attribute
1    Canada
1    United States
2    Germany
3    Canada
4    Germany
4    United States

I want to turn the table above into:

id   attribute.Canada   attribute.UnitedStates   attribute.Germany
1    1.0                1.0                      0.0
2    0.0                0.0                      1.0
3    1.0                0.0                      0.0
4    0.0                1.0                      1.0

I wish to accomplish three things:

each row will have a unique ID
the values under the "attribute" label become column names that are hot encoded
export the new table back to CSV

Answer 1

I would only like to give you head start. Take the unique values of attribute column and append it in an array having 'id' initialized earlier. Get the unique values of 'id' (ie 1 2 3 4) and add them as index and earlier array as column to initialize the dataframe.

Iterate through the unique values of 'id', doing so use regex read the lines starting with the 'id' value. Extract the attribute values and frame the dictionary with value 1.0 and append it to the dataframe, later replace the NaN with 0.0.

How to one hot encode with multiple labels in Python?

Question

1 answers

solution1
1 2019-12-16 05:27:36

How to one hot encode with multiple labels in Python?

Question

1 answers

solution1 1 2019-12-16 05:27:36

solution1
1 2019-12-16 05:27:36