简体   繁体   English

Keras 模型分类为每个模型重新编译返回不同的结果

[英]Keras Model classifying returning different results for each model recompiling

I'm starting with Keras creating a model to classify text labels by inputting a couple of text features with a single output.我从 Keras 开始,通过使用单个输出输入几个文本特征来创建一个模型来对文本标签进行分类。 I've a specific function to create the model and another one to test the model using a different dataset.我有一个特定的函数来创建模型,另一个函数来使用不同的数据集测试模型。

I'm still trying to fine tune the model predictions but i'd like to try understand why my test function is getting different results every time the model is recreated.我仍在尝试微调模型预测,但我想尝试了解为什么每次重新创建模型时我的测试函数都会得到不同的结果。 Is that usual ?这是平常的吗? Also, i'd appreciate any tip to improve the model accuracy.另外,我很感激任何提高模型准确性的技巧。

def create_model(model_name,data,test_data): def create_model(model_name,data,test_data):

# lets take 80% data as training and remaining 20% for test.
train_size = int(len(data) * .9)
test_size = int(len(data) * .4)

train_headlines = data['Subject']
train_category = data['Category']
train_activities = data['Activity']

test_headlines = data['Subject'][:test_size]
test_category = data['Category'][:test_size]
test_activities = data['Activity'][:test_size]
# define Tokenizer with Vocab Sizes
vocab_size1 = 10000 
vocab_size2 = 5000
batch_size = 100
tokenizer = Tokenizer(num_words=vocab_size1)
tokenizer2 = Tokenizer(num_words=vocab_size2)

test_headlines=test_headlines.astype(str)
train_headlines=train_headlines.astype(str)
test_category=test_category.astype(str)
train_category=train_category.astype(str)

tokenizer.fit_on_texts(test_headlines)
tokenizer2.fit_on_texts(test_category)
x_train = tokenizer.texts_to_matrix(train_headlines, mode='tfidf')
x_test = tokenizer.texts_to_matrix(test_headlines, mode='tfidf')

y_train = tokenizer2.texts_to_matrix(train_category, mode='tfidf')
y_test = tokenizer2.texts_to_matrix(test_category, mode='tfidf')


# load classes
labels = []
encoder = LabelBinarizer()
encoder.fit(train_activities)
text_labels = encoder.classes_    
with open('outputs/classes.txt', 'w') as f:
    for item in text_labels:
        f.write("%s\n" % item)  
z_train = encoder.transform(train_activities)
z_test = encoder.transform(test_activities)
num_classes = len(text_labels)
print ("num_classes: "+str(num_classes))


input1 = Input(shape=(vocab_size1,), name='main_input')
x1 = Dense(512, activation='relu')(input1)
x1 = Dense(64, activation='relu')(x1)
x1 = Dense(64, activation='relu')(x1)
input2 = Input(shape=(vocab_size2,), name='cat_input')
main_output = Dense(num_classes, activation='softmax', name='main_output')(x1)

model = Model(inputs=[input1, input2], outputs=[main_output])

model.compile(loss='categorical_crossentropy',
            optimizer='adam',
            metrics=['accuracy'])

model.summary() 
history = model.fit([x_train,y_train], z_train,
                    batch_size=batch_size,
                    epochs=30,
                    verbose=1,
                    validation_split=0.1)
score = model.evaluate([x_test,y_test], z_test,
    batch_size=batch_size, verbose=1)

print('Test accuracy:', score[1])

# serialize model to JSON
model_json = model.to_json()
with open("./outputs/my_model_"+model_name+".json", "w") as json_file:
    json_file.write(model_json)
# creates a HDF5 file 'my_model.h5'
model.save('./outputs/my_model_'+model_name+'.h5')

# Save Tokenizer i.e. Vocabulary
with open('./outputs/tokenizer'+model_name+'.pickle', 'wb') as handle:
    pickle.dump(tokenizer, handle, protocol=pickle.HIGHEST_PROTOCOL)

def validate_model (model_name,test_data,labels): def validate_model (model_name,test_data,labels):

from keras.models import model_from_json
test_data['Subject'] = test_data['Subject'] + " " + test_data['Description']
headlines = test_data['Subject'].astype(str)     
categories = test_data['Category'].astype(str)

# load json and create model
json_file = open("./outputs/my_model_"+model_name+".json", 'r') 
loaded_model_json = json_file.read()
json_file.close()
model = model_from_json(loaded_model_json)
# load weights into new model
model.load_weights('./outputs/my_model_'+model_name+'.h5')
print("Loaded model from disk")
# loading
import pickle
with open('./outputs/tokenizer'+model_name+'.pickle', 'rb') as handle:
    tokenizer = pickle.load(handle)    
# Subjects 
x_pred = tokenizer.texts_to_matrix(headlines, mode='tfidf')
# Categorias
y_pred = tokenizer.texts_to_matrix(categories, mode='tfidf')  
predictions = []
scores = []
predictions_vetor = model.predict({'main_input': x_pred, 'cat_input': y_pred}) 

I read your training code following.我在下面阅读了您的培训代码。

model.fit([x_train,y_train], z_train,
                batch_size=batch_size,
                epochs=30,
                verbose=1,
                validation_split=0.1)

You are using [x_train, y_train] as features and z_train as labels for your model.您使用 [x_train, y_train] 作为特征,使用 z_train 作为模型的标签。 y_train is the raw form of label and z_train is the encoded form of label. y_train 是标签的原始形式,z_train 是标签的编码形式。

This way you are leaking information to the training set, hence resulting in an over-fitting situation.通过这种方式,您会将信息泄漏到训练集,从而导致过度拟合的情况。 You model is not generalised at all, and therefore predicting irrelevant results.您的模型根本没有泛化,因此预测了不相关的结果。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 每个训练keras模型后的结果都不同 - The results after each training keras model are different 加载 keras model 的结果不同 - Results of loaded keras model are different tensorflowjs 和 keras 在相同模型和张量上的不同结果 - Different results for tensorflowjs and keras on same model and tensor 在Google Cloud上训练不同的Keras模型的结果 - Results of training a Keras model different on Google Cloud Python:Keras 模型对于相同的数据和相同的模型返回不同的结果 - Python: Keras model returns different results for the same data and same model Keras model.predict给model.evalute提供了不同的结果 - Keras model.predict gives different results to model.evalute keras model.get_weight没有以预期的尺寸返回结果 - keras model.get_weight is not returning results in expected dimensions 用自定义层加载Keras中保存的模型,预测结果不一样? - Loading a saved model in Keras with a custom layer and prediction results are different? Keras LSTM - 为什么“相同”模型和相同权重的结果不同? - Keras LSTM - why different results with “same” model & same weights? 混淆矩阵在 Keras model tf==2.3.0 中产生不同的结果 - Confusion Matrix produces different results in Keras model tf==2.3.0
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM