TensorFlow model correctly predicting images, but not frames from real time video stream?

Question

Why does my TensorFlow model correctly predict JPG and PNG images but incorrectly predict frames from real time video stream? All frames in the real time video stream are all being incorrectly classified as class 1.

Attempt: I saved a PNG image from the realtime video stream. When I saved the PNG image separately and tested it, the model correctly classifies it. When a similar image is a frame in the real time video stream it is incorrectly classified. The PNG images and real time video stream frames have identical content visually (background, lighting condition, camera angle, etc.).

Structure of my model:

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
rescaling_2 (Rescaling)      (None, 180, 180, 3)       0
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 180, 180, 16)      448
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 90, 90, 16)        0
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 90, 90, 32)        4640
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 45, 45, 32)        0
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 45, 45, 64)        18496
_________________________________________________________________
max_pooling2d_5 (MaxPooling2 (None, 22, 22, 64)        0
_________________________________________________________________
flatten_1 (Flatten)          (None, 30976)             0
_________________________________________________________________
dense_2 (Dense)              (None, 128)               3965056
_________________________________________________________________
dense_3 (Dense)              (None, 3)                 387
=================================================================
Total params: 3,989,027
Trainable params: 3,989,027
Non-trainable params: 0
_________________________________________________________________
Found 1068 files belonging to 3 classes.

Realtime prediction code:

import cv2
 
# define a video capture object
vid = cv2.VideoCapture(0)
 
while(True):
     
    # Capture the video frame
    # by frame
    ret, frame = vid.read()
 
    #reshape frame to prepare for prediction
    frame= cv2.resize(frame, (180,180))
    frame = np.asarray(frame)
    frame = frame.astype('float32')
    frame /= 255.0
    frame2 = np.expand_dims(frame, axis=0)
 
    # #class prediction
    classPrediction = np.argmax(new_model.predict(frame2), axis=-1)
    print(classPrediction)
    print(type(classPrediction))
   
    # #probability prediction
    prediction = new_model.predict(frame2)
    print(prediction)
    print(type(prediction))
 
    position = (10,50)
    cv2.putText(
    frame, #numpy array on which text is written
    “prediction”, #text
    position, #position at which writing has to start
    cv2.FONT_HERSHEY_SIMPLEX, #font family
    1, #font size
    (209, 80, 0, 255), #font color
    3) #font stroke
   
    # Display the resulting frame
    cv2.imshow('frame', frame)
     
    # the 'q' button is set as the
    # quitting button you may use any
    # desired button of your choice
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break
 
# After the loop release the cap object
vid.release()
# Destroy all the windows
cv2.destroyAllWindows()

Here I output the prediction probablity by class and the class prediction for each frame of the realtime video stream:

Optimization Passes are enabled (registered 2)
7/7 - 3s - loss: 8.6775e-05 - accuracy: 1.0000
Restored model, accuracy: 100.00%
[0]
[[ 1.1169542  -2.9432456  -0.65884787]]
[0]
[[ 1.1167728 -2.9428637 -0.6589765]]
[0]
[[ 1.1169429 -2.943745  -0.6584857]]
[0]
[[ 1.1156341  -2.9435503  -0.65733105]]
[0]
[[ 1.116857  -2.9430726 -0.6590006]]
[0]
[[ 1.1165708  -2.9452283  -0.65692884]]
[0]
[[ 1.1218321 -2.946224  -0.6618703]]

Test model function (working on individual PNG & JPG, but not real time video frames)

def testModel(imageName):
  import cv2
  from PIL import Image
  from tensorflow.keras.preprocessing import image_dataset_from_directory
  batch_size = 32
  img_height = 180
  img_width = 180
  img = keras.preprocessing.image.load_img(
  imageName, target_size=(img_height, img_width)
  )
  img_array = keras.preprocessing.image.img_to_array(img)
  img_array = tf.expand_dims(img_array, 0) #Create a batch
 
  predictions = new_model.predict(img_array)
  score = tf.nn.softmax(predictions[0])
 
  print(
      "This image {} most likely belongs to {} with a {:.2f} percent confidence."
      .format(imageName, class_names[np.argmax(score)], 100 * np.max(score))
  )
 
  im = cv2.imread(imageName)
  im_resized = cv2.resize(im, (224, 224), interpolation=cv2.INTER_LINEAR)
 
#Use model to classify images.
testModel(‘test.PNG')

EDIT: I attempted Micka's recommendation for fies (changing to RGB and using INTER_NEAREST in cv2.resize. The same issue is occuring.

    #reshape frame to prepare for prediction
    frame= cv2.resize(frame, (180,180), interpolation= cv2.INTER_NEAREST)
    frame = np.asarray(frame)
    frame = frame.astype('float32')
    frame = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR)
    frame /= 255.0
    frame2 = np.expand_dims(frame, axis=0)

EDIT 2: I'm trying to analyze differences in data between videocapture frame data and training data.

#image_dataset_from_directory returns a tf.data.Dataset that yields batches of images from 
#the subdirectories class_a and class_b, together with labels 0 and 1.
from keras.preprocessing import image
directory_test = "/content/test"
tf.keras.utils.image_dataset_from_directory(
    directory_test, labels='inferred', label_mode='int',
    class_names=None, color_mode='rgb', batch_size=32, image_size=(256,
    256), shuffle=True, seed=None, validation_split=None, subset=None,
    interpolation='bilinear', follow_links=False,
    crop_to_aspect_ratio=False
)
 
tf.keras.utils.image_dataset_from_directory(directory_test, labels='inferred')
 
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
  directory_test,
  validation_split=0.2,
  subset="training",
  seed=123,
  image_size=(img_height, img_width),
  batch_size=batch_size)

Is the accuracy being affected by the reshaping in the realtime prediction code? I do not understand why frame predictions are incorrect, but single JPG and PNG image predictions are correct. Thank you for any help!

Answer 1

the reason for the real time prediction not correct is because of the preprocessing. The preprocessing of the inference code should be always same as the preprocessing used while training. Use tf.keras.preprocessing.image.load_img in your real-time prediction code but it takes image path to load the image. so you can save each frame by name "sample.png" and pass this path to tf.keras.preprocessing.image.load_img . this should solve the issue. and use the resize method "bilinear" because that was used for training data

TensorFlow model correctly predicting images, but not frames from real time video stream?

Question

1 answers

solution1
0 2021-11-02 11:36:07

TensorFlow model correctly predicting images, but not frames from real time video stream?

Question

1 answers

solution1 0 2021-11-02 11:36:07

solution1
0 2021-11-02 11:36:07