[英]How to do one hot encoded and get the number of classes programatically in Python? For artificial neural networks
I have a list called y_train
that looks like this我有一个名为
y_train
的列表,看起来像这样
[ 307 307 307 ... 257947 257947 257947]
The 307, 257947
are all distinct IDs that I want to do one hot encoding on. 307, 257947
都是我想要对其进行热编码的不同 ID。 It has a total of 480 classes.它总共有480个班级。 The list
y_train
has a length of 10799 also as the number of rows.列表
y_train
的长度为 10799,也作为行数。 How can I do one hot encoding on this so that it shows that it has 480 class and 10799 rows?如何对此进行热编码,以显示它具有 480 class 和 10799 行? I'm trying to fit this in a tensorflow model.
我正在尝试将其安装在 tensorflow model 中。
There are multiple ways to do this.有多种方法可以做到这一点。 One of the methods is to use pandas.get_dummies to convert categorical variables into dummy/indicator variables.
其中一种方法是使用pandas.get_dummies将分类变量转换为虚拟/指标变量。
Example:例子:
df = pd.DataFrame({'A': ['a', 'b', 'a'], 'B': ['b', 'a', 'c'],
'C': [1, 2, 3]})
pd.get_dummies(df, prefix=['col1', 'col2'])
C col1_a col1_b col2_a col2_b col2_c
0 1 1 0 0 1 0
1 2 0 1 1 0 0
2 3 1 0 0 0 1
Another method is to use Scikit-learn's OneHotEncoder另一种方法是使用 Scikit-learn 的OneHotEncoder
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.