[英]Converting a list of strings to a list of int for machine learning
I have a table with a column corresponding to the education level : "phd", "undergrad"
, etc. 我有一个表,该表的列与教育程度相对应:
"phd", "undergrad"
等。
I would like to change those features to 0, 1, 2....
. 我想将这些功能更改为
0, 1, 2....
in order to use the data as an input of a machine learning algorithm. 为了将数据用作机器学习算法的输入。
Is there a way in Python to automatically map those string features to integers? Python中有没有一种方法可以将这些字符串特征自动映射为整数?
It's called a dict (dictionary). 这就是字典(字典)。 Something like this:
像这样:
edu_level = {
"phd": 0;
"master": 1;
"undergrad":2,
...
}
Look up how to work with dictionaries, perhaps using keys "Python dictionary tutorial". 查找如何使用字典,也许使用“ Python字典教程”键。
You will likely not want to feed the data as it is to a machine learning algorithm - for instance, if a phd is a 2 and a masters is a 1, does that mean that the phd is twice as good? 您可能不希望将数据直接提供给机器学习算法-例如,如果phd是2,而master是1,这是否意味着phd是原来的两倍? You might instead use "one hot encoding" and create a binary matrix (ie 1's & 0's) that you can feed.
您可能改为使用“一种热编码”,并创建一个可以提供的二进制矩阵(即1和0)。
There are multiple libraries to do this, ie one such: http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html 有很多库可以做到这一点,例如: http : //scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html
Good luck! 祝好运!
BA MS PHD Person A 0 0 0 Person B 1 0 0 Person C 0 0 1 ...
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.