[英]How to one hot encode with multiple labels in Python?
I have a table in CSV that looks something like this:我有一个 CSV 表格,看起来像这样:
id attribute
1 Canada
1 United States
2 Germany
3 Canada
4 Germany
4 United States
I want to turn the table above into:我想把上面的表格变成:
id attribute.Canada attribute.UnitedStates attribute.Germany
1 1.0 1.0 0.0
2 0.0 0.0 1.0
3 1.0 0.0 0.0
4 0.0 1.0 1.0
I wish to accomplish three things:我希望完成三件事:
I would only like to give you head start.我只想给你一个开端。 Take the unique values of attribute column and append it in an array having 'id' initialized earlier.获取属性列的唯一值,并将其附加到一个具有先前初始化的 'id' 的数组中。 Get the unique values of 'id' (ie 1 2 3 4) and add them as index and earlier array as column to initialize the dataframe.获取 'id' 的唯一值(即 1 2 3 4)并将它们添加为索引,将较早的数组添加为列以初始化数据帧。
Iterate through the unique values of 'id', doing so use regex read the lines starting with the 'id' value.遍历“id”的唯一值,这样做使用正则表达式读取以“id”值开头的行。 Extract the attribute values and frame the dictionary with value 1.0 and append it to the dataframe, later replace the NaN with 0.0.提取属性值并使用值 1.0 构建字典并将其附加到数据帧,稍后将 NaN 替换为 0.0。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.