简体   繁体   English

训练 Keras 模型会产生多个优化器错误

[英]Training a Keras model yields multiple optimizer errors

So I need to retrain Tiny YOLO using my own dataset.所以我需要使用我自己的数据集重新训练 Tiny YOLO。 The model I am using can be found here: keras-yolo3 .我正在使用的模型可以在这里找到: keras-yolo3

I started training and I get multiple optimizer errors, added the code of the errors to stop confusion.我开始训练并遇到多个优化器错误,添加了错误代码以防止混淆。 And I noticed the training is going slow even tho it should use the GPU, and after digging a bit I found that this is not using the GPU for training.我注意到即使它应该使用 GPU,训练也会变慢,经过深入研究后,我发现这不是使用 GPU 进行训练。 I should note that on another smaller network which I used for learning training uses GPU so everything is set correctly from that side, and they are no errors of this type when I did that training.我应该注意到,在我用于学习训练的另一个较小的网络上使用 GPU,所以从那一边一切都设置正确,当我进行训练时,它们没有这种类型的错误。

Is this slow and somewhat CPU training because of said errors?由于上述错误,这是否缓慢且有点 CPU 训练? How can I fix this does anyone know?有谁知道我该如何解决这个问题?

Using TensorFlow backend.
WARNING: Logging before flag parsing goes to stderr.
2019-08-19 09:45:08.057713: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library nvcuda.dll
2019-08-19 09:45:08.264577: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: GeForce GTX 1060 6GB major: 6 minor: 1 memoryClockRate(GHz): 1.8475
pciBusID: 0000:01:00.0
2019-08-19 09:45:08.270723: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2019-08-19 09:45:08.275827: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-08-19 09:45:09.214197: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-08-19 09:45:09.217605: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0
2019-08-19 09:45:09.219777: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N
2019-08-19 09:45:09.222399: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4712 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0, compute capability: 6.1)
Create Tiny YOLOv3 model with 6 anchors and 80 classes.
Load weights model_data/tiny_yolo_weights.h5.
Freeze the first 42 layers of total 44 layers.
Train on 8298 samples, val on 922 samples, with batch size 32.
Epoch 1/50
2019-08-19 09:45:19.742610: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:502] shape_optimizer failed: Invalid argument: Subshape must have computed start >= end since stride is negative, but is 0 and 2 (computed from start 0 and end 9223372036854775807 over shape with rank 2 and stride-1)
2019-08-19 09:45:19.781035: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:502] remapper failed: Invalid argument: Subshape must have computed start >= end since stride is negative, but is 0 and 2 (computed from start 0 and end 9223372036854775807 over shape with rank 2 and stride-1)
2019-08-19 09:45:19.935930: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:502] layout failed: Invalid argument: Subshape must have computed start >= end since stride is negative, but is 0 and 2 (computed from start 0 and end 9223372036854775807 over shape with rank 2 and stride-1)
2019-08-19 09:45:20.168936: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:502] shape_optimizer failed: Invalid argument: Subshape must have computed start >= end since stride is negative, but is 0 and 2 (computed from start 0 and end 9223372036854775807 over shape with rank 2 and stride-1)
2019-08-19 09:45:20.205304: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:502] remapper failed: Invalid argument: Subshape must have computed start >= end since stride is negative, but is 0 and 2 (computed from start 0 and end 9223372036854775807 over shape with rank 2 and stride-1)
258/259 [============================>.] - ETA: 3s - loss: 41.82962019-08-19 10:01:51.053474: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:502] remapper failed: Invalid argument: Subshape must have computed start >= end since stride is negative, but is 0 and 2 (computed from start 0 and end 9223372036854775807 over shape with rank 2 and stride-1)
2019-08-19 10:01:51.138957: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:502] layout failed: Invalid argument: Subshape must have computed start >= end since stride is negative, but is 0 and 2 (computed from start 0 and end 9223372036854775807 over shape with rank 2 and stride-1)
2019-08-19 10:01:51.243888: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:502] remapper failed: Invalid argument: Subshape must have computed start >= end since stride is negative, but is 0 and 2 (computed from start 0 and end 9223372036854775807 over shape with rank 2 and stride-1)
259/259 [==============================] - 1078s 4s/step - loss: 41.8008 - val_loss: 35.7122

I have found solution here: https://github.com/tensorflow/tensorrt/issues/118我在这里找到了解决方案: https : //github.com/tensorflow/tensorrt/issues/118

You have to change lines(140/141) in yolo3/model.py:您必须在 yolo3/model.py 中更改行(140/141):

box_xy = (K.sigmoid(feats[..., :2]) + grid) / K.cast(grid_shape[::-1], K.dtype(feats))
box_wh = K.exp(feats[..., 2:4]) * anchors_tensor / K.cast(input_shape[::-1], K.dtype(feats))

to:到:

box_xy = (K.sigmoid(feats[..., :2]) + grid) / K.cast(grid_shape[...,::-1], K.dtype(feats))
box_wh = K.exp(feats[..., 2:4]) * anchors_tensor / K.cast(input_shape[...,::-1], K.dtype(feats))

Also in my case helps decrease batch size from 8 to 4 .同样在我的情况下,有助于将批量大小8减少到4

It is a version issue. 这是一个版本问题。 Please reinstall Keras and Tensorflow. 请重新安装Keras和Tensorflow。 Use the version 2.1.5 for Keras and 1.6.0 for Tensorflow. 对于Keras使用2.1.5版本,对于Tensorflow使用1.6.0版本。 Please check the readme for the requirements. 请检查自述文件中的要求。 Additionally, consider changing the train.py file as per the fix in https://github.com/qqwweee/keras-yolo3/issues/548 . 此外,请考虑根据https://github.com/qqwweee/keras-yolo3/issues/548中的修复程序更改train.py文件。 That is an issue similar to yours. 那是和你相似的问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM