简体   繁体   中英

How to one hot encode with multiple labels in Python?

I have a table in CSV that looks something like this:

id   attribute
1    Canada
1    United States
2    Germany
3    Canada
4    Germany
4    United States

I want to turn the table above into:

id   attribute.Canada   attribute.UnitedStates   attribute.Germany
1    1.0                1.0                      0.0
2    0.0                0.0                      1.0
3    1.0                0.0                      0.0
4    0.0                1.0                      1.0

I wish to accomplish three things:

  1. each row will have a unique ID
  2. the values under the "attribute" label become column names that are hot encoded
  3. export the new table back to CSV

I would only like to give you head start. Take the unique values of attribute column and append it in an array having 'id' initialized earlier. Get the unique values of 'id' (ie 1 2 3 4) and add them as index and earlier array as column to initialize the dataframe.

Iterate through the unique values of 'id', doing so use regex read the lines starting with the 'id' value. Extract the attribute values and frame the dictionary with value 1.0 and append it to the dataframe, later replace the NaN with 0.0.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM