
[英]Confusion Matrix - ValueError: Found input variables with inconsistent numbers of samples
[英]Getting error for input variables with inconsistent numbers of samples while computing confusion matrix
我再次收到此错误:ValueError:在计算混淆矩阵时发现样本数量不一致的输入变量:[16979, 271664]。 在我之前的帖子中,我在创建 CNN 模型时询问,并解决了我的问题,使用相同的代码为具有相同数据集的预训练模型计算混淆矩阵,并再次出现此错误。 我不知道我为什么要拥有它,如果有人可以解释并提供解决方案会有所帮助。
train_datagen = ImageDataGenerator(rescale = 1./255,
shear_range = 0.2,
zoom_range = 0.2,
horizontal_flip = True,
validation_split=0.2) #validation_data = 20%
test_datagen = ImageDataGenerator(rescale = 1./255)
train_data = train_datagen.flow_from_directory(train_dataset_dir,
target_size = (224, 224),
batch_size = batch_Size,
class_mode = 'categorical',
shuffle = True,
subset = 'training')
valid_data = train_datagen.flow_from_directory(train_dataset_dir,
target_size = (224, 224),
batch_size = batch_Size,
class_mode = 'categorical',
shuffle = True,
subset = 'validation')
test_data = test_datagen.flow_from_directory(test_data_dir,
target_size = (224, 224),
batch_size = batch_Size,
shuffle = False,
class_mode = None)
print(train_data.class_indices)
basemodel = tf.keras.applications.mobilenet.MobileNet()
# don't train existing weights
for layer in basemodel.layers:
layer.trainable = False
headmodel = basemodel.output
headmodel = Flatten()(headmodel)
headmodel = Dense(128, activation='relu')(headmodel)
#headmodel = Dense(8, activation='relu')(headmodel)
headmodel = Dropout(0.7)(headmodel)
headmodel = Dense(2,activation= 'softmax')(headmodel)
model = Model(inputs=basemodel.input, outputs= headmodel)
# view the structure of the model
model.summary()
model.compile(optimizer='Adam', loss='categorical_crossentropy',metrics=['accuracy'])
STEP_SIZE_TRAIN=train_data.n//train_data.batch_size
STEP_SIZE_VALID=valid_data.n//valid_data.batch_size
history = model.fit(train_data,
steps_per_epoch=STEP_SIZE_TRAIN,
validation_data=valid_data,
validation_steps=STEP_SIZE_VALID,
epochs=10
)
#predict output
import numpy as np
STEP_SIZE_TEST=test_data.n//test_data.batch_size
test_data.reset()
y_pred=model.predict_generator(test_data,
steps=STEP_SIZE_TEST,
verbose=1)
predicted_class=np.argmax(y_pred,axis=1)
print(predicted_class)
#Confusion Matrix and Classification Report
import sklearn.metrics as metrics
true_class = test_data.classes
#true_class = tf.concat([y for y in test_data], axis=0)
print(true_class.shape)
print(predicted_class.shape)
print(true_class)
class_labels = list(test_data.class_indices.keys())
print(class_labels)
print('Confusion Matrix')
cm = metrics.confusion_matrix(true_class, predicted_class)
print(cm)```
(16979,)
(271664,)
tf.Tensor([0 0 0 ... 1 1 1], shape=(16979,), dtype=int32)
['closed', 'open']
Confusion Matrix
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_15624/962738184.py in <module>
13
14 print('Confusion Matrix')
---> 15 cm = metrics.confusion_matrix(true_class, predicted_class)
16 print(cm)
17
p:\conda\envs\PRML\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
70 FutureWarning)
71 kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 72 return f(**kwargs)
73 return inner_f
74
p:\conda\envs\PRML\lib\site-packages\sklearn\metrics\_classification.py in confusion_matrix(y_true, y_pred, labels, sample_weight, normalize)
274
275 """
--> 276 y_type, y_true, y_pred = _check_targets(y_true, y_pred)
277 if y_type not in ("binary", "multiclass"):
278 raise ValueError("%s is not supported" % y_type)
p:\conda\envs\PRML\lib\site-packages\sklearn\metrics\_classification.py in _check_targets(y_true, y_pred)
79 y_pred : array or indicator matrix
80 """
---> 81 check_consistent_length(y_true, y_pred)
82 type_true = type_of_target(y_true)
83 type_pred = type_of_target(y_pred)
p:\conda\envs\PRML\lib\site-packages\sklearn\utils\validation.py in check_consistent_length(*arrays)
254 if len(uniques) > 1:
255 raise ValueError("Found input variables with inconsistent numbers of"
--> 256 " samples: %r" % [int(l) for l in lengths])
257
258
ValueError: Found input variables with inconsistent numbers of samples: [16979, 271664]
尝试打印 STEP_SIZE_TEST 你会发现它是 271664 这可能是不正确的。 在 model.predict_generator 中不要指定它将在内部计算的步骤。 不确定您使用的是哪个版本的 TF,但 predict_generator 已折旧,您可以使用 model.predict,因为它现在可以处理生成器。 如果你真的想计算步长,它应该是这样的
batch_size X step size = number of test samples.
这将确保您只通过一次测试集。 下面是一些将计算值的代码
length=len(test_data)
test_batch_size=sorted([int(length/n) for n in range(1,length+1) if length % n ==0 and length/n<=80],reverse=True)[0]
test_steps=int(length/test_batch_size)
print ( 'test batch size: ' ,test_batch_size, ' test steps: ', test_steps)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.