简体   繁体   English

Keras VGG16 预测速度慢

[英]Keras VGG16 predict speed slow

I'm working on a feature extractor for this transfer learning personal project, and the predict function of Kera's VGG16 model seems pretty slow (31 seconds for a batch of 4 images).我正在为这个迁移学习个人项目开发一个特征提取器,Kera 的 VGG16 model 的预测 function 似乎相当慢(一批 4 张图像需要 31 秒)。 I do expect it to be slow, but not sure if the prediction function is slower than it should be.我确实预计它会很慢,但不确定预测 function 是否比它应该的慢。

data = DataGenerator() 
data = data.from_csv(csv_path=csv_file,
                     img_dir=img_folder,
                     batch_size=batch)

#####################################################
conv_base = VGG16(include_top=False, 
                  weights='imagenet', 
                  input_shape=(480, 640, 3))

model = Sequential()
model.add(conv_base)
model.add(MaxPooling2D(pool_size=(3, 4)))
model.add(Flatten())
######################################################

for inputs, y in data:
    feature_batch = model.predict(inputs)

    yield feature_batch, y

So, my hunch is that it is slow for these reasons:所以,我的直觉是,由于以下原因,它很慢:

  • my input data is a bit large (loading in (480, 640, 3) size images)我的输入数据有点大(加载(480、640、3)大小的图像)
  • I am running on a weak CPU (M3-6Y30 @ 0.90GHz)我在较弱的 CPU 上运行(M3-6Y30 @ 0.90GHz)
  • I have a flatten operation at the end of the feature extractor.我在特征提取器的末尾有一个展平操作。

Things I've tried:我尝试过的事情:

  • Other StackOverFlow posts suggested adding a max pooling layer to reduce the feature size / remove the extraneous zero's.其他 StackOverFlow 帖子建议添加最大池化层以减小特征大小/删除无关的零。 I made I think a pretty large max pool window (thus reducing the feature size significantly, but my prediction time increased.我让我认为一个相当大的最大池 window (从而显着减少了特征大小,但我的预测时间增加了。
  • Batch processing doesn't improve time which is probably obvious due to the use of my M3 CPU).批处理不会缩短时间,这可能是由于使用了我的 M3 CPU 而显而易见的)。 A batch size of 1 image takes 8 seconds, a batch size of 4 takes 32. 1 张图片的批量大小需要 8 秒,4 的批量大小需要 32 秒。

Are there any ideas on how to speed up the prediction function?关于如何加快预测 function 有什么想法吗? I need to run this through at least 10,000 images, and due to the nature of the project I would like to retain as much of the raw data as possible before going into the model (will be comparing it with other feature extraction models)我需要运行至少 10,000 张图像,并且由于项目的性质,我希望在进入 model 之前保留尽可能多的原始数据(将其与其他特征提取模型进行比较)

All my image files are saved locally, but I can try to setup a cloud computer and move my code over there to run with GPU support.我所有的图像文件都保存在本地,但我可以尝试设置云计算机并将我的代码移到那里以在 GPU 支持下运行。

Is the issue simply I am running the VGG16 model on a dinky CPU?问题仅仅是我在极小的 CPU 上运行 VGG16 model 吗?

Guidance would be much appreciated.指导将不胜感激。

There are many issues with your model.你的模型有很多问题。 The main issue is of course really slow machine, but as you cannot change that here I will state some pieces of advice on how you could speed up your computations:主要问题当然是机器真的很慢,但由于您无法改变它,我将在这里提出一些关于如何加快计算速度的建议:

  1. VGG16 is relatively old architecture. VGG16是比较老的架构。 The main issue here is that the so-called volume of tensors (area of feature maps times number of features) is decreased really slowly.这里的主要问题是所谓的张量体积(特征图的面积乘以特征数)确实下降得很慢。 I would advise you to use more modern architectures like eg ResNet50 or Inception v3 as they have the so-called stem which is making inside tensors much smaller really fast.我建议您使用更现代的架构,例如ResNet50Inception v3,因为它们具有所谓的茎,可以使内部张量非常快地变小。 Your speed should benefit thanks to that.由于这一点,您的速度应该会受益。 There is also a really light architecture called MobileNet which seems perfect for your task.还有一种称为MobileNet 的轻量级架构,它似乎非常适合您的任务。

  2. Downsample your images - with a size of (480, 640) your image is 6 times bigger than default VGG input.对您的图像进行下采样- 尺寸为(480, 640)您的图像比默认VGG输入大 6 倍。 This makes all computations 6 times slower.这使得所有计算都慢了 6 倍。 You could try to first downsample images and then use a feature extractor.您可以尝试先对图像进行下采样,然后使用特征提取器。

VGG16 is a very big model. VGG16是一个非常大的model。 The same accuracy could be reached with modern smaller models such as MobileNetV3 or EfficientNet.现代较小的模型(例如 MobileNetV3 或 EfficientNet)可以达到相同的精度。

However, if you have to use your model you could try OpenVINO .但是,如果您必须使用 model,您可以尝试OpenVINO OpenVINO is optimized for Intel hardware but it should work with any CPU. OpenVINO 针对 Intel 硬件进行了优化,但它应该适用于任何 CPU。 It optimizes your model by converting to Intermediate Representation (IR), performing graph pruning and fusing some operations into others while preserving accuracy.它通过转换为中间表示 (IR)、执行图形修剪并将某些操作融合到其他操作中来优化您的 model,同时保持准确性。 Then it uses vectorization in runtime.然后它在运行时使用矢量化。

Here are some performance benchmarks for various models and CPUs. 以下是各种型号和 CPU 的一些性能基准。 Your processor ( M3-6Y30 ) is 6th generation so it should be supported.您的处理器 ( M3-6Y30 ) 是第 6 代,因此应该支持它。

It's rather straightforward to convert the Keras model to OpenVINO unless you have fancy custom layers.除非您有精美的自定义层,否则将 Keras model 转换为 OpenVINO 相当简单。 The full tutorial on how to do it can be found here .可以在此处找到有关如何执行此操作的完整教程。 Some snippets below.下面的一些片段。

Install OpenVINO安装 OpenVINO

The easiest way to do it is using PIP.最简单的方法是使用 PIP。 Alternatively, you can use this tool to find the best way in your case.或者,您可以使用此工具找到适合您情况的最佳方法。

pip install openvino-dev[tensorflow2]

Save your model as SavedModel将您的 model 保存为 SavedModel

OpenVINO is not able to convert HDF5 model, so you have to save it as SavedModel first. OpenVINO 无法转换 HDF5 model,因此您必须先将其另存为 SavedModel。

import tensorflow as tf
from custom_layer import CustomLayer
model = tf.keras.models.load_model('model.h5', custom_objects={'CustomLayer': CustomLayer})
tf.saved_model.save(model, 'model')

Use Model Optimizer to convert SavedModel model使用 Model 优化器转换 SavedModel model

The Model Optimizer is a command-line tool that comes from OpenVINO Development Package. Model Optimizer 是来自 OpenVINO Development Package 的命令行工具。 It converts the Tensorflow model to IR, which is a default format for OpenVINO.它将 Tensorflow model 转换为 IR,这是 OpenVINO 的默认格式。 You can also try the precision of FP16, which should give you better performance without a significant accuracy drop (just change data_type).您还可以尝试 FP16 的精度,它应该可以为您提供更好的性能而不会显着降低精度(只需更改 data_type)。 Run in the command line:在命令行中运行:

mo --saved_model_dir "model" --input_shape "[1, 3, 224, 224]" --data_type FP32 --output_dir "model_ir"

Run the inference运行推理

The converted model can be loaded by the runtime and compiled for a specific device eg CPU or GPU (integrated into your CPU like Intel HD Graphics).转换后的 model 可以由运行时加载并针对特定设备进行编译,例如 CPU 或 GPU(集成到您的 CPU 中,如 Intel HD Graphics)。 If you don't know what is the best choice for you, just use AUTO.如果您不知道什么是最适合您的选择,请使用 AUTO。

# Load the network
ie = Core()
model_ir = ie.read_model(model="model_ir/model.xml")
compiled_model_ir = ie.compile_model(model=model_ir, device_name="CPU")

# Get output layer
output_layer_ir = compiled_model_ir.output(0)

# Run inference on the input image
result = compiled_model_ir([input_image])[output_layer_ir]

Disclaimer: I work on OpenVINO.免责声明:我在 OpenVINO 上工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM