简体   繁体   English

盗梦空间:如何处理与盗梦空间一起使用的图像

[英]Inception: How to process image to use with Inception

I want to make tensorflow's inception v3 to give out tags for an image. 我想使tensorflow的初始版本v3给出图像的标签。 My goal is to convert a JPEG image to input that is accepted by inception neural network. 我的目标是将JPEG图像转换为初始神经网络接受的输入。 I don't know how to process the images first so that it can run with Google Inception's v3 model. 我不知道如何首先处理图像,以便它可以与Google Inception的v3模型一起运行。 The original tensorflow project is here: https://github.com/tensorflow/models/tree/master/inception 原始的tensorflow项目在这里: https : //github.com/tensorflow/models/tree/master/inception

Originally, all the images are in a dataset and the entire dataset is first passed to input() or distorted_inputs() in ImageProcessing.py . 最初,所有图像都在数据集中,并且首先将整个数据集传递到ImageProcessing.py中的 input()或distorted_inputs()。 The images in dataset are processed and passed to the train() or eval() methods (both of these work). 处理数据集中的图像,并将其传递给train()或eval()方法(这两项工作)。 The problem is I want a function to print out tags for one specific image (not dataset). 问题是我想要一个函数为一个特定的图像(不是数据集)打印标签。

Below is the code for inference function that is used to generate tag with google inception. 以下是用于使用Google Inception生成标签的推理功能的代码。 inceptionv4 function is a convolutional neural network implemented in tensorflow. inceptionv4函数是在张量流中实现的卷积神经网络。

def inference(images, num_classes, for_training=False, restore_logits=True,
              scope=None):
  """Build Inception v3 model architecture.

  See here for reference: http://arxiv.org/abs/1512.00567

  Args:
    images: Images returned from inputs() or distorted_inputs().
    num_classes: number of classes
    for_training: If set to `True`, build the inference model for training.
      Kernels that operate differently for inference during training
      e.g. dropout, are appropriately configured.
    restore_logits: whether or not the logits layers should be restored.
      Useful for fine-tuning a model with different num_classes.
    scope: optional prefix string identifying the ImageNet tower.

  Returns:
    Logits. 2-D float Tensor.
    Auxiliary Logits. 2-D float Tensor of side-head. Used for training only.
  """
  # Parameters for BatchNorm.
  batch_norm_params = {
      # Decay for the moving averages.
      'decay': BATCHNORM_MOVING_AVERAGE_DECAY,
      # epsilon to prevent 0s in variance.
      'epsilon': 0.001,
  }
  # Set weight_decay for weights in Conv and FC layers.
  with slim.arg_scope([slim.ops.conv2d, slim.ops.fc], weight_decay=0.00004):
    with slim.arg_scope([slim.ops.conv2d],
                        stddev=0.1,
                        activation=tf.nn.relu,
                        batch_norm_params=batch_norm_params):
      logits, endpoints = inception_v4(
          images,
          dropout_keep_prob=0.8,
          num_classes=num_classes,
          is_training=for_training,
          scope=scope)

  # Add summaries for viewing model statistics on TensorBoard.
  _activation_summaries(endpoints)

  # Grab the logits associated with the side head. Employed during training.
  auxiliary_logits = endpoints['AuxLogits']

  return logits, auxiliary_logits

This is my attempt to process the image before it is passed to inference function. 这是我尝试在将图像传递给推理功能之前对其进行处理。

  def process_image(self, image_path):
    filename_queue = tf.train.string_input_producer(image_path)
    reader = tf.WholeFileReader()
    key, value = reader.read(filename_queue)

    img = tf.image.decode_jpeg(value)
    height = self.image_size
    width = self.image_size
    image_data = tf.cast(img, tf.float32)
    image_data = tf.reshape(image_data, shape=[1, height, width, 3])
    return image_data

I wanted to process an image file simply so that I can pass it to the inference function. 我想简单地处理图像文件,以便将其传递给推理功能。 And that inference prints out the tags. 该推论会打印出标签。 The above code didn't work and printed error: 上面的代码无法正常工作并显示错误:

ValueError: Shape () must have rank at least 1

I appreciate if anyone can provide any insight into this problem. 如果有人可以提供有关此问题的任何见解,我将不胜感激。

Inception just needs (299,299,3) images with inputs scaled between -1 and 1. See code below. 初始只需要(299,299,3)张图像,其输入的比例在-1和1之间。请参见下面的代码。 I just change the images using this and put them in a TFRecord ( and then queue ) to run my stuff. 我只是使用它来更改图像,然后将它们放入TFRecord(然后排队)以运行我的东西。

from PIL import Image
import PIL
import numpy as np
def load_image( self, image_path ):
    img = Image.open( image_path )
    newImg = img.resize((299,299), PIL.Image.BILINEAR).convert("RGB")
    data = np.array( newImg.getdata() )
    return 2*( data.reshape( (newImg.size[0], newImg.size[1], 3) ).astype( np.float32 )/255 ) - 1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM