简体   繁体   English

使我的数据适合Keras顺序模型和密集层并产生输出

[英]Making my data fit Keras Sequential Models and Dense Layers and produce output

I have structured data that looks like this. 我有看起来像这样的结构化数据。

faults.head()

Fault   DEALER  FAILMODE    FAILCODEMODE    DAYS UNTIL FAILURE  TERRITORY CODE  DESIGN PHASE CODE   PLANT ID CODE
0   CAMPAIGN/TRP    31057   CAMPAIGN    BNRBC1  283.0   102 62  82
1   INTERMITTENT PROBL  24126   SPECIAL (NO FAILURE)    XXIPNF  126.0   102 62  82
2   DSID #DSBCG2058 TAG #362783 EXHAUST SYSTEM. U...    0   CLOGGED, PLUGGED WITH FOREIGN MATERIAL, DIRT/D...   USDVDR  118.0   102 62  82
3   INTERMITTENT PROBL  20943   SPECIAL (NO FAILURE)    XXIPNF  97.0    102 62  82
4   CAMPAIGN    19134   CAMPAIGN    USSCR1  315.0   102 62  82

I'm trying to predict the class FAILMODE. 我正在尝试预测FAILMODE类。 There's only 122 unique values in FAILMODE. FAILMODE中只有122个唯一值。 Those are my classes. 那是我的课。

Based on all the other data in the rows, I want to have a one-hot matrix or even the class itself be a product of the computation on my test set. 根据行中的所有其他数据,我希望有一个单一矩阵,甚至该类本身都应该是我的测试集上的计算结果。 Here's my code so far- 到目前为止,这是我的代码-

from keras.models import Sequential
from keras.layers import Dense
Using Theano backend.

faults_testing = faults[:14843]
faults_training = faults[14844:]

model = Sequential()
model.add(Dense(len(faults.FAILMODE.unique()) + 20, input_dim=len(faults_training), init='uniform', activation='relu'))
model.add(Dense(len(faults_training), init='uniform', activation='relu'))
model.add(Dense(len(faults.FAILMODE.unique()), init='uniform', activation='sigmoid'))

model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

Here is where the tutorial says- 这是本教程的内容-

model.fit(X, Y, nb_epoch=len(faults_training), batch_size=10)

I don't know what X or Y is so I just tried the following- 我不知道X或Y是什么,所以我尝试了以下操作-

model.fit(faults_training['FAILMODE'], faults_testing['FAILMODE'], nb_epoch=len(faults_training), batch_size=10)

It resulted in this error- 导致此错误-

ValueError                                Traceback (most recent call last)
<ipython-input-54-e8765933cfb9> in <module>()
----> 1 model.fit(faults_training['FAILMODE'], faults_testing['FAILMODE'], nb_epoch=len(faults_training), batch_size=10)

ValueError: Error when checking model input: expected dense_input_1 to have shape (None, 34631) but got array with shape (34631L, 1L)

Please be thorough with your answer. 请彻底回答。 Thank you! 谢谢!

Regular neural network (including Keras Sequential model) only accepts float for data (X) and int or one-hot-encoding for label/class (Y). 常规神经网络(包括Keras序列模型)仅接受float数据(X)和int或一键编码的标签/类(Y)。 So you need to convert your dataset to match the requirements. 因此,您需要转换数据集以符合要求。 So what you may want to do: 因此,您可能想做什么:

  1. Map all categorical (string) values (eg. CAMPAIGN/TRP, BNRBC1, XXIPNF) to float (it will be better if you can normalize the data) 将所有类别(字符串)值(例如CAMPAIGN / TRP,BNRBC1,XXIPNF)映射为浮点(如果可以对数据进行规范化会更好)
  2. Put all the data columns (excluding the label column) in X 将所有数据列(不包括标签列)放入X
  3. Put the label column (should be 1 column only) inside Y and convert to one-hot-encoding using to_categorical eg Y = to_categorical(Y) 将标签列(仅应为1列)放在Y内,然后使用to_categorical转换为一键编码,例如Y = to_categorical(Y)
  4. Split the training and testing data using function such as train_test_split X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.33) 使用诸如train_test_split X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.33)功能拆分训练和测试数据
  5. Train the model using model.fit(X_train, Y_train, nb_epoch=100, batch_size=10) . 使用model.fit(X_train, Y_train, nb_epoch=100, batch_size=10)训练模型。 Adjust the nb_epoch and batch_size later on based on your expectation of speed and accuracy of the training 稍后根据您对训练速度和准确性的期望来调整nb_epochbatch_size
  6. Evaluate the accuracy using scores = model.evaluate(self, X_test, Y_test, batch_size=10) 使用scores = model.evaluate(self, X_test, Y_test, batch_size=10)评估准确性

You can also check this article to get idea how to convert categorical data to numeric http://fastml.com/converting-categorical-data-into-numbers-with-pandas-and-scikit-learn/ 您也可以查看本文以了解如何将分类数据转换为数字http://fastml.com/converting-categorical-data-into-numbers-with-pandas-and-scikit-learn/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM