简体   繁体   English

如何在 Python 中进行一次热编码并以编程方式获取类数? 对于人工神经网络

[英]How to do one hot encoded and get the number of classes programatically in Python? For artificial neural networks

I have a list called y_train that looks like this我有一个名为y_train的列表,看起来像这样

[   307    307    307 ... 257947 257947 257947]

The 307, 257947 are all distinct IDs that I want to do one hot encoding on. 307, 257947都是我想要对其进行热编码的不同 ID。 It has a total of 480 classes.它总共有480个班级。 The list y_train has a length of 10799 also as the number of rows.列表y_train的长度为 10799,也作为行数。 How can I do one hot encoding on this so that it shows that it has 480 class and 10799 rows?如何对此进行热编码,以显示它具有 480 class 和 10799 行? I'm trying to fit this in a tensorflow model.我正在尝试将其安装在 tensorflow model 中。

There are multiple ways to do this.有多种方法可以做到这一点。 One of the methods is to use pandas.get_dummies to convert categorical variables into dummy/indicator variables.其中一种方法是使用pandas.get_dummies将分类变量转换为虚拟/指标变量。

Example:例子:

df = pd.DataFrame({'A': ['a', 'b', 'a'], 'B': ['b', 'a', 'c'],
                   'C': [1, 2, 3]})

pd.get_dummies(df, prefix=['col1', 'col2'])
   C  col1_a  col1_b  col2_a  col2_b  col2_c
0  1       1       0       0       1       0
1  2       0       1       1       0       0
2  3       1       0       0       0       1

Another method is to use Scikit-learn's OneHotEncoder另一种方法是使用 Scikit-learn 的OneHotEncoder

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM