简体   繁体   English

转换和映射分类数据

[英]Converting and mapping categorical data

I have a dataset which has categoric and numeric columns.我有一个包含分类和数字列的数据集。 I want to convert the categoric data to numeric together with mapping each type of category to a specific numeric value.我想将分类数据转换为数字,并将每种类型的类别映射到特定的数值。 For example, under column ['Education'], I have Highschool, Undergraduate, Graduate, PHD etc. I'd appreciate if someone could provide me the code to map each code to an arbitrary numeric value.例如,在 ['Education'] 列下,我有高中、本科、研究生、博士等。如果有人可以向我提供 map 的代码,我将不胜感激,每个代码都是任意数值。

import pandas as pd
df = pd.DataFrame(["Highschool", "Undergraduate","Highschool" ,"Graduate", "PHD", "Graduate", "Graduate","Undergraduate"],columns = ["Education"]) 
df_transformed = pd.get_dummies(df)

df_transformed.head()

OP:操作:

 Education_Graduate Education_Highschool    Education_PHD   Education_Undergraduate
0         0               1                    0                0
1         0               0                    0                1
2         0               1                    0                0
3         1               0                    0                0
4         0               0                    1                0

#Label Encoding #标签编码

from sklearn import preprocessing
encoder = preprocessing.LabelEncoder()
encoder.fit(df["Education"].values)

#use_any_input_list_here and it will assign a numerical value. I have given a sample list
encoder.transform(["Undergraduate","Highschool" ,"Graduate"])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM