繁体   English   中英

计算混淆矩阵时,输入变量的样本数不一致时出现错误

[英]Getting error for input variables with inconsistent numbers of samples while computing confusion matrix

我再次收到此错误:ValueError:在计算混淆矩阵时发现样本数量不一致的输入变量:[16979, 271664]。 在我之前的帖子中,我在创建 CNN 模型时询问,并解决了我的问题,使用相同的代码为具有相同数据集的预训练模型计算混淆矩阵,并再次出现此错误。 我不知道我为什么要拥有它,如果有人可以解释并提供解决方案会有所帮助。

    train_datagen = ImageDataGenerator(rescale = 1./255,
                                   shear_range = 0.2,
                                   zoom_range = 0.2,
                                   horizontal_flip = True,
                                   validation_split=0.2)  #validation_data = 20%

    test_datagen = ImageDataGenerator(rescale = 1./255)

    train_data = train_datagen.flow_from_directory(train_dataset_dir,
                                                 target_size = (224, 224),
                                                 batch_size = batch_Size,
                                                 class_mode = 'categorical',
                                                 shuffle = True,
                                                 subset = 'training')

    valid_data = train_datagen.flow_from_directory(train_dataset_dir,
                                                 target_size = (224, 224),
                                                 batch_size = batch_Size,
                                                 class_mode = 'categorical',
                                                 shuffle = True,
                                                 subset = 'validation')

    test_data = test_datagen.flow_from_directory(test_data_dir,
                                            target_size = (224, 224),
                                            batch_size = batch_Size,
                                            shuffle = False,
                                            class_mode = None)
    print(train_data.class_indices)
    basemodel = tf.keras.applications.mobilenet.MobileNet()
    # don't train existing weights
    for layer in basemodel.layers:
     layer.trainable = False
    
    headmodel = basemodel.output
    headmodel = Flatten()(headmodel)
    headmodel = Dense(128, activation='relu')(headmodel)

    #headmodel = Dense(8, activation='relu')(headmodel)
    headmodel = Dropout(0.7)(headmodel)

    headmodel = Dense(2,activation= 'softmax')(headmodel)

    model = Model(inputs=basemodel.input, outputs= headmodel)

    # view the structure of the model
    model.summary()
    model.compile(optimizer='Adam', loss='categorical_crossentropy',metrics=['accuracy'])

    STEP_SIZE_TRAIN=train_data.n//train_data.batch_size
    STEP_SIZE_VALID=valid_data.n//valid_data.batch_size
    history = model.fit(train_data,
                    steps_per_epoch=STEP_SIZE_TRAIN,
                    validation_data=valid_data,
                    validation_steps=STEP_SIZE_VALID,
                    epochs=10
    )





    #predict output
    import numpy as np

    STEP_SIZE_TEST=test_data.n//test_data.batch_size
    test_data.reset()
    y_pred=model.predict_generator(test_data,
    steps=STEP_SIZE_TEST,
    verbose=1)

    predicted_class=np.argmax(y_pred,axis=1)
    print(predicted_class)

    #Confusion Matrix and Classification Report
    import sklearn.metrics as metrics


    true_class = test_data.classes
    #true_class = tf.concat([y for y in test_data], axis=0)
    print(true_class.shape)
    print(predicted_class.shape)

    print(true_class)
    class_labels = list(test_data.class_indices.keys())  
    print(class_labels)

    print('Confusion Matrix')
    cm = metrics.confusion_matrix(true_class, predicted_class)
    print(cm)```

(16979,)
(271664,)
tf.Tensor([0 0 0 ... 1 1 1], shape=(16979,), dtype=int32)
['closed', 'open']
Confusion Matrix
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_15624/962738184.py in <module>
     13 
     14 print('Confusion Matrix')
---> 15 cm = metrics.confusion_matrix(true_class, predicted_class)
     16 print(cm)
     17 

p:\conda\envs\PRML\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
     70                           FutureWarning)
     71         kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 72         return f(**kwargs)
     73     return inner_f
     74 

p:\conda\envs\PRML\lib\site-packages\sklearn\metrics\_classification.py in confusion_matrix(y_true, y_pred, labels, sample_weight, normalize)
    274 
    275     """
--> 276     y_type, y_true, y_pred = _check_targets(y_true, y_pred)
    277     if y_type not in ("binary", "multiclass"):
    278         raise ValueError("%s is not supported" % y_type)

p:\conda\envs\PRML\lib\site-packages\sklearn\metrics\_classification.py in _check_targets(y_true, y_pred)
     79     y_pred : array or indicator matrix
     80     """
---> 81     check_consistent_length(y_true, y_pred)
     82     type_true = type_of_target(y_true)
     83     type_pred = type_of_target(y_pred)

p:\conda\envs\PRML\lib\site-packages\sklearn\utils\validation.py in check_consistent_length(*arrays)
    254     if len(uniques) > 1:
    255         raise ValueError("Found input variables with inconsistent numbers of"
--> 256                          " samples: %r" % [int(l) for l in lengths])
    257 
    258 

ValueError: Found input variables with inconsistent numbers of samples: [16979, 271664]


尝试打印 STEP_SIZE_TEST 你会发现它是 271664 这可能是不正确的。 在 model.predict_generator 中不要指定它将在内部计算的步骤。 不确定您使用的是哪个版本的 TF,但 predict_generator 已折旧,您可以使用 model.predict,因为它现在可以处理生成器。 如果你真的想计算步长,它应该是这样的

batch_size X step size = number of test samples.

这将确保您只通过一次测试集。 下面是一些将计算值的代码

length=len(test_data)
test_batch_size=sorted([int(length/n) for n in range(1,length+1) if length % n ==0 and length/n<=80],reverse=True)[0]  
test_steps=int(length/test_batch_size)
print ( 'test batch size: ' ,test_batch_size, '  test steps: ', test_steps)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM