简体   繁体   English

从 Keras 功能 model 获取 class 标签

[英]Get class labels from Keras functional model

I have a functional model in Keras (Resnet50 from repo examples).我在 Keras 中有一个功能 model(来自 repo 示例的 Resnet50)。 I trained it with ImageDataGenerator and flow_from_directory data and saved model to .h5 file.我使用ImageDataGeneratorflow_from_directory数据对其进行了训练,并将 model 保存到.h5文件中。 When I call model.predict I get an array of class probabilities.当我调用model.predict时,我得到一个 class 概率数组。 But I want to associate them with class labels (in my case - folder names).但我想将它们与 class 标签(在我的情况下 - 文件夹名称)相关联。 How can I get them?我怎样才能得到它们? I found that I could use model.predict_classes and model.predict_proba , but I don't have these functions in Functional model, only in Sequential.我发现我可以使用model.predict_classesmodel.predict_proba ,但我在功能 model 中没有这些功能,仅顺序。

y_prob = model.predict(x) 
y_classes = y_prob.argmax(axis=-1)

As suggested here .正如这里所建议的。

When one uses flow_from_directory the problem is how to interpret the probability outputs.当使用 flow_from_directory 时,问题是如何解释概率输出。 As in, how to map the probability outputs and the class labels as how flow_from_directory creates one-hot vectors is not known in prior.比如,如何将概率输出和类标签映射为 flow_from_directory 如何创建 one-hot 向量在之前是未知的。

We can get a dictionary that maps the class labels to the index of the prediction vector that we get as the output when we use我们可以得到一个字典,它将类标签映射到我们使用时作为输出得到的预测向量的索引

generator= train_datagen.flow_from_directory("train", batch_size=batch_size)
label_map = (generator.class_indices)

The label_map variable is a dictionary like this label_map 变量是这样的字典

{'class_14': 5, 'class_10': 1, 'class_11': 2, 'class_12': 3, 'class_13': 4, 'class_2': 6, 'class_3': 7, 'class_1': 0, 'class_6': 10, 'class_7': 11, 'class_4': 8, 'class_5': 9, 'class_8': 12, 'class_9': 13}

Then from this the relation can be derived between the probability scores and class names.然后由此可以推导出概率分数和类名之间的关系。

Basically, you can create this dictionary by this code.基本上,您可以通过此代码创建此字典。

from glob import glob
class_names = glob("*") # Reads all the folders in which images are present
class_names = sorted(class_names) # Sorting them
name_id_map = dict(zip(class_names, range(len(class_names))))

The variable name_id_map in the above code also contains the same dictionary as the one obtained from class_indices function of flow_from_directory.上面代码中的变量name_id_map也包含与flow_from_directory的class_indices函数获得的字典相同的字典。

Hope this helps!希望这可以帮助!

UPDATE: This is no longer valid for newer Keras versions.更新:这对较新的 Keras 版本不再有效。 Please use argmax() as in the answer from Emilia Apostolova.请在 Emilia Apostolova 的回答中使用argmax()

The functional API models have just the predict() function which for classification would return the class probabilities.函数式 API 模型只有predict()函数,用于分类将返回类概率。 You can then select the most probable classes using the probas_to_classes() utility function.然后,您可以使用probas_to_classes()实用程序函数选择最可能的类。 Example:例子:

y_proba = model.predict(x)
y_classes = keras.np_utils.probas_to_classes(y_proba)

This is equivalent to model.predict_classes(x) on the Sequential model.这相当于 Sequential 模型上的model.predict_classes(x)

The reason for this is that the functional API support more general class of tasks where predict_classes() would not make sense.这样做的原因是函数式 API 支持更一般的任务类别,其中predict_classes()没有意义。

More info: https://github.com/fchollet/keras/issues/2524更多信息: https : //github.com/fchollet/keras/issues/2524

In addition to @Emilia Apostolova answer to get the ground truth labels, from除了@Emilia Apostolova 回答以获取基本事实标签之外,来自

generator = train_datagen.flow_from_directory("train", batch_size=batch_size)

just call打电话

y_true_labels = generator.classes

You must use the labels index you have, here what I do for text classification:您必须使用您拥有的标签索引,这里是我为文本分类所做的:

# data labels = [1, 2, 1...]
labels_index = { "website" : 0, "money" : 1 ....} 
# to feed model
label_categories = to_categorical(np.asarray(labels)) 

Then, for predictions:然后,对于预测:

texts = ["hello, rejoins moi sur skype", "bonjour comment ça va ?", "tu me donnes de l'argent"]

sequences = tokenizer.texts_to_sequences(texts)

data = pad_sequences(sequences, maxlen=MAX_SEQUENCE_LENGTH)

predictions = model.predict(data)

t = 0

for text in texts:
    i = 0
    print("Prediction for \"%s\": " % (text))
    for label in labels_index:
        print("\t%s ==> %f" % (label, predictions[t][i]))
        i = i + 1
    t = t + 1

This gives:这给出:

Prediction for "hello, rejoins moi sur skype": 
    website ==> 0.759483
    money ==> 0.037091
    under ==> 0.010587
    camsite ==> 0.114436
    email ==> 0.075975
    abuse ==> 0.002428
Prediction for "bonjour comment ça va ?": 
    website ==> 0.433079
    money ==> 0.084878
    under ==> 0.048375
    camsite ==> 0.036674
    email ==> 0.369197
    abuse ==> 0.027798
Prediction for "tu me donnes de l'argent": 
    website ==> 0.006223
    money ==> 0.095308
    under ==> 0.003586
    camsite ==> 0.003115
    email ==> 0.884112
    abuse ==> 0.007655

It is possible to save a "list" of labels in keras model directly.可以直接在 keras 模型中保存标签“列表”。 This way the user who uses the model for predictions and does not have any other sources of information can perform the lookup himself.这样,使用模型进行预测并且没有任何其他信息来源的用户可以自己执行查找。 Here is a dummy example of how one can perform an "injection" of labels这是一个如何执行标签“注入”的虚拟示例

# assume we get labels as list
labels = ["cat","dog","horse","tomato"]
# here we start building our model with input image 299x299 and one output layer
xx = Input(shape=(299,299,3))
flat = Flatten()(xx)
output = Dense(shape=(4))(flat)
# here we perform injection of labels
tf_labels = tf.constant([labels],dtype="string")
tf_labels = tf.tile(labels,[tf.shape(xx)[0],1])
output_labels = Lambda(lambda x: tf_labels,name="label_injection")(xx)
#and finaly creating a model
model=tf.keras.Model(xx,[output,output_labels])

When used for prediction, this model returns tensor of scores and tensot of string labels.当用于预测时,该模型返回分数张量和字符串标签张量。 Model like this can be saved to h5.这样的模型可以保存到h5。 In this case the file contains the labels.在这种情况下,文件包含标签。 This model can also be exported to saved_model and used for serving in the cloud.该模型也可以导出到saved_model,用于云端服务。

To map predicted classes and filenames using ImageDataGenerator , I use:要使用ImageDataGenerator映射预测的类和文件名,我使用:

# Data generator and prediction
test_datagen = ImageDataGenerator(rescale=1./255)
test_generator = test_datagen.flow_from_directory(
        inputpath,
        target_size=(150, 150),
        batch_size=20,
        class_mode='categorical',
        shuffle=False)
pred = model.predict_generator(test_generator, steps=len(test_generator), verbose=0)
# Get classes by max element in np (as a list)
classes = list(np.argmax(pred, axis=1))
# Get filenames (set shuffle=false in generator is important)
filenames = test_generator.filenames

I can loop over predicted classes and the associated filename using:我可以使用以下方法循环预测类和相关文件名:

for f in zip(classes, filenames):
    ...
(model.predict(x_test)).argmax(axis=-1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM