[英]Are there any options or parameters that I can change to reduce the training time when training the Convolutional neural network (DenseNet)?
I am using the following python code 'ImageAI' to use DenseNet for my research. 我使用以下python代码'ImageAI'来使用DenseNet进行我的研究。
ImageAI github: https://github.com/OlafenwaMoses/ImageAI ImageAI github: https : //github.com/OlafenwaMoses/ImageAI
ImageAI example: https://towardsdatascience.com/train-image-recognition-ai-with-5-lines-of-code-8ed0bdd8d9ba ImageAI示例: https ://towardsdatascience.com/train-image-recognition-ai-with-5-lines-of-code-8ed0bdd8d9ba
I am currently doing research related to symbol recognition (2D building drawing symbols) using the CNN approach (DenseNet). 我目前正在使用CNN方法(DenseNet)进行与符号识别(2D建筑绘图符号)相关的研究。 One example of a symbol of a VAV box is: https://ibb.co/cyhwRvf
VAV盒符号的一个例子是: https : //ibb.co/cyhwRvf
I am trying to classify 39 classes(number of objects which is the number of symbols in this case) and have 2,000 images of each class for training data(2,000 x 39 = 78,000). 我正在尝试对39个类别(在这种情况下是符号数量的对象数量)进行分类,并且每个类别的2,000个图像用于训练数据(2,000 x 39 = 78,000)。 And I have 1,000 images of each for the test data(1,000 x 39 = 39,000).
我有1000张图像用于测试数据(1,000 x 39 = 39,000)。 The total size of the dataset is 1.82 GB (I consider this as a relatively small size but please correct me if I am wrong).
数据集的总大小是1.82 GB(我认为这是一个相对较小的大小,但如果我错了请纠正我)。
But the problem is that the training time is taking so much time. 但问题是训练时间花了很多时间。
I have tried using the GPU (Nvidia Geforce RTX 2080 Ti) and it is taking 3 days to train when I set the epoch(number of experiments) to be 200. 我尝试过使用GPU(Nvidia Geforce RTX 2080 Ti),当我将纪元(实验数量)设置为200时,需要3天才能进行训练。
I would like to know whether if there is a way to reduce the time for training. 我想知道是否有办法减少培训时间。 Is there any parameters that I can change or any other options?
我可以更改任何参数或任何其他选项吗?
Or is this considered normal consumption time considering the size of the dataset and the GPU that I am using? 或者考虑到我使用的数据集和GPU的大小,这被视为正常消耗时间?
The five lines of code for training are the following: 五行培训代码如下:
from imageai.Prediction.Custom import ModelTraining
model_trainer = ModelTraining()
model_trainer.setModelTypeAsDenseNet()
model_trainer.setDataDirectory("mechsymbol")
model_trainer.trainModel(num_objects=39, num_experiments=200, enhance_data=True, batch_size=32, show_network_summary=True)
Im fairly new to ImageAI/Tensorflow but still learning. 我是ImageAI / Tensorflow的新手,但还在学习。
In terms of getting a faster learning using parameters, I think the enhance_data will get you a bit faster learning by setting it to False. 在使用参数获得更快的学习方面,我认为通过将其设置为False, enhance_data可以让您更快地学习。 This parameter should be set to True if you dont have such a large set of images to train on (less than 1000).
如果您没有要训练的大量图像(小于1000),则此参数应设置为True。 This parameter will more samples from existing ones.
此参数将从现有参数中获取更多样本。
enhance_data = False
And also worth mention is that you don not have to do all 200 generation training if you dont get any better results after for example 50 epoch due to overfitting you model. 同样值得一提的是,如果你因为过度拟合模型而在50个时代之后没有得到任何更好的结果,你就不必进行所有200代培训。
Have heard about "mixed precision traning" using FP16 instead of FP32? 听过使用FP16而不是FP32的“混合精确训练”?
I have not tried it myself since it would be a marginal difference on a Nvidia GTX 1080 Ti since it does not have tensor cores. 我自己没有尝试过,因为它没有Nvidia GTX 1080 Ti,因为它没有张量核心。 But you can read more about it here https://medium.com/tensorflow/automatic-mixed-precision-in-tensorflow-for-faster-ai-training-on-nvidia-gpus-6033234b2540
但你可以在这里阅读更多相关信息https://medium.com/tensorflow/automatic-mixed-precision-in-tensorflow-for-faster-ai-training-on-nvidia-gpus-6033234b2540
For reference: 以供参考:
When I do my ResNet training of a image data set with 62 classes (26GB) which has between 1000 and 2000 images for training and 200 for validation it takes about 10 minutes for each epoch. 当我对62个类别(26GB)的图像数据集进行ResNet培训时,其具有1000到2000个用于训练的图像和200个用于验证的图像,每个时期大约需要10分钟。
That would be about 33 hours for a 200 generation training. 对于200代培训,这将是大约33个小时。
This is my imageAi python lines of code 这是我的imageAi python代码行
model_trainer = ModelTraining()
model_trainer.setModelTypeAsResNet()
model_trainer.setDataDirectory('.')
model_trainer.trainModel(num_objects=classes_in_train, num_experiments=100,
enhance_data=False, batch_size=32)
Found 81086 images belonging to 62 classes.
Found 13503 images belonging to 62 classes.
Epoch 38/100
231/2533 [=>............................] - ETA: 10:00 - loss: 2.9065e-04 - acc: 0.9999
Hope it might give you helpful info. 希望它可能会给你有用的信息。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.