[英]TensorFlow model gets zero loss
import tensorflow as tf
import numpy as np
import os
import re
import PIL
def read_image_label_list(img_directory, folder_name):
# Input:
# -Name of folder (test\\\\train)
# Output:
# -List of names of files in folder
# -Label associated with each file
cat_label = 1
dog_label = 0
filenames = []
labels = []
dir_list = os.listdir(os.path.join(img_directory, folder_name)) # List of all image names in 'folder_name' folder
# Loop through all images in directory
for i, d in enumerate(dir_list):
if re.search("train", folder_name):
if re.search("cat", d): # If image filename contains 'Cat', then true
labels.append(cat_label)
else:
labels.append(dog_label)
filenames.append(os.path.join(img_dir, folder_name, d))
return filenames, labels
# Define convolutional layer
def conv_layer(input, channels_in, channels_out):
w_1 = tf.get_variable("weight_conv", [5,5, channels_in, channels_out], initializer=tf.contrib.layers.xavier_initializer())
b_1 = tf.get_variable("bias_conv", [channels_out], initializer=tf.zeros_initializer())
conv = tf.nn.conv2d(input, w_1, strides=[1,1,1,1], padding="SAME")
activation = tf.nn.relu(conv + b_1)
return activation
# Define fully connected layer
def fc_layer(input, channels_in, channels_out):
w_2 = tf.get_variable("weight_fc", [channels_in, channels_out], initializer=tf.contrib.layers.xavier_initializer())
b_2 = tf.get_variable("bias_fc", [channels_out], initializer=tf.zeros_initializer())
activation = tf.nn.relu(tf.matmul(input, w_2) + b_2)
return activation
# Define parse function to make input data to decode image into
def _parse_function(img_path, label):
img_file = tf.read_file(img_path)
img_decoded = tf.image.decode_image(img_file, channels=3)
img_decoded.set_shape([None,None,3])
img_decoded = tf.image.resize_images(img_decoded, (28, 28), method=tf.image.ResizeMethod.NEAREST_NEIGHBOR)
img_decoded = tf.image.per_image_standardization(img_decoded)
img_decoded = tf.cast(img_decoded, dty=tf.float32)
label = tf.one_hot(label, 1)
return img_decoded, label
tf.reset_default_graph()
# Define parameterspe
EPOCHS = 10
BATCH_SIZE_training = 64
learning_rate = 0.001
img_dir = 'C:/Users/tharu/PycharmProjects/cat_vs_dog/data'
batch_size = 128
# Define data
features, labels = read_image_label_list(img_dir, "train")
# Define dataset
dataset = tf.data.Dataset.from_tensor_slices((features, labels)) # Takes slices in 0th dimension
dataset = dataset.map(_parse_function)
dataset = dataset.batch(batch_size)
iterator = dataset.make_initializable_iterator()
# Get next batch of data from iterator
x, y = iterator.get_next()
# Create the network (use different variable scopes for reuse of variables)
with tf.variable_scope("conv1"):
conv_1 = conv_layer(x, 3, 32)
pool_1 = tf.nn.max_pool(conv_1, ksize=[1,2,2,1], strides=[1,2,2,1], padding="SAME")
with tf.variable_scope("conv2"):
conv_2 = conv_layer(pool_1, 32, 64)
pool_2 = tf.nn.max_pool(conv_2, ksize=[1,2,2,1], strides=[1,2,2,1], padding="SAME")
flattened = tf.contrib.layers.flatten(pool_2)
with tf.variable_scope("fc1"):
fc_1 = fc_layer(flattened, 7*7*64, 1024)
with tf.variable_scope("fc2"):
logits = fc_layer(fc_1, 1024, 1)
# Define loss function
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=tf.cast(y, dtype=tf.int32)))
# Define optimizer
train = tf.train.AdamOptimizer(learning_rate).minimize(loss)
with tf.Session() as sess:
# Initiliaze all the variables
sess.run(tf.global_variables_initializer())
# Train the network
for i in range(EPOCHS):
# Initialize iterator so that it starts at beginning of training set for each epoch
sess.run(iterator.initializer)
print("EPOCH", i)
while True:
try:
_, epoch_loss = sess.run([train, loss])
except tf.errors.OutOfRangeError: # Error given when out of data
if i % 2 == 0:
# [train_accuaracy] = sess.run([accuracy])
# print("Step ", i, "training accuracy = %{}".format(train_accuaracy))
print(epoch_loss)
break
I've spent a few hours trying to figure out systematically why I've been getting 0 loss when I run this model. 我花了几个小时试图系统地弄清楚为什么我在运行此模型时会亏损0。
Initially I thought it was because there was something wrong with my data. 最初我以为是因为我的数据有问题。 But I've viewed the data after being resized and the images seems fine.
但是我在调整大小后查看了数据,并且图像看起来还不错。
Then I tried a few different loss functions because I thought maybe I'm misunderstanding what the the tensorflow function softmax_cross_entropy
does, but that didn't fix anything. 然后我尝试了一些不同的损失函数,因为我想可能是我误会了张量流函数
softmax_cross_entropy
作用,但这并不能解决任何问题。
I've tried running just the 'logits' section to see what the output is. 我试过只运行“登录”部分以查看输出是什么。 This is just a small sample and the numbers seem fine to me:
这只是一个小样本,数字对我来说似乎很好:
[[0.06388957]
[0. ]
[0.16969752]
[0.24913025]
[0.09961276]]
Surely then the softmax_cross_entropy
function should be able to compute this loss given that the corresponding labels are 0 or 1? 假设相应的标签为0或1,那么
softmax_cross_entropy
函数当然应该能够计算此损失? I'm not sure if I'm missing something. 我不确定是否遗漏了一些东西。 Any help would be greatly appreciated.
任何帮助将不胜感激。
As documented : 如记录所示 :
logits
andlabels
must have the same shape, eg[batch_size, num_classes]
and the same dtype (eitherfloat16
,float32
, orfloat64
).logits
和labels
必须具有相同的形状,例如[batch_size, num_classes]
和相同的float16
(float16
,float32
或float64
)。
Since you mentioned your label is "[Batch_size, 1] one_hot vector", I would assume both your logits
and labels
are [Batch_size, 1] shape. 由于您提到的标签是“ [Batch_size,1] one_hot向量”,因此我假设您的
logits
和labels
都是[Batch_size,1]形状。 This will certainly lead to zero loss. 这肯定会导致零损失。 Conceptually speaking, you have only 1 class (
num_classes=1
) and your cannot be wrong ( loss=0
). 从概念上讲,您只有1个班级(
num_classes=1
),并且您不会记错( loss=0
)。
So at least for you labels
, you should transform it: tf.one_hot(indices=labels, depth=num_classes)
. 因此,至少对于
labels
,应该对其进行转换: tf.one_hot(indices=labels, depth=num_classes)
。 Your prediction logits
should also have a shape [batch_size, num_classes]
output. 您的预测对
logits
也应具有形状[batch_size, num_classes]
输出。
Alternatively, you can use sparse_softmax_cross_entropy_with_logits
, where: 或者,您可以使用
sparse_softmax_cross_entropy_with_logits
,其中:
A common use case is to have logits of shape [batch_size, num_classes] and labels of shape [batch_size].
一个常见的用例是具有形状[batch_size,num_classes]的登录名和形状[batch_size]的标签。 But higher dimensions are supported.
但是支持更高的尺寸。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.