简体   繁体   English

将字符串列表转换为int列表以进行机器学习

[英]Converting a list of strings to a list of int for machine learning

I have a table with a column corresponding to the education level : "phd", "undergrad" , etc. 我有一个表,该表的列与教育程度相对应: "phd", "undergrad"等。

I would like to change those features to 0, 1, 2.... . 我想将这些功能更改为0, 1, 2.... in order to use the data as an input of a machine learning algorithm. 为了将数据用作机器学习算法的输入。

Is there a way in Python to automatically map those string features to integers? Python中有没有一种方法可以将这些字符串特征自动映射为整数?

You can use enumerate if you want to have the same serial order. 如果要具有相同的序列顺序,可以使用enumerate

lista = [ "phd", "undergrad", "etc"]
>>> [i for i in enumerate(lista)]
[(0, 'phd'), (1, 'undergrad'), (2, 'etc')]

It's called a dict (dictionary). 这就是字典(字典)。 Something like this: 像这样:

edu_level = {
    "phd": 0;
    "master": 1;
    "undergrad":2, 
    ...
}

Look up how to work with dictionaries, perhaps using keys "Python dictionary tutorial". 查找如何使用字典,也许使用“ Python字典教程”键。

You will likely not want to feed the data as it is to a machine learning algorithm - for instance, if a phd is a 2 and a masters is a 1, does that mean that the phd is twice as good? 您可能不希望将数据直接提供给机器学习算法-例如,如果phd是2,而master是1,这是否意味着phd是原来的两倍? You might instead use "one hot encoding" and create a binary matrix (ie 1's & 0's) that you can feed. 您可能改为使用“一种热编码”,并创建一个可以提供的二进制矩阵(即1和0)。

There are multiple libraries to do this, ie one such: http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html 有很多库可以做到这一点,例如: http//scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html

Good luck! 祝好运!

  BA MS PHD Person A 0 0 0 Person B 1 0 0 Person C 0 0 1 ... 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM