训练具有超过 300k 类的图像分类器

Question

是否可以训练具有大量类别的图像分类器网络？ （比如 300k 类），每个 class 在训练/测试/验证之间至少有 10 个图像（即 >3mil 250x250x3 图像）。

我尝试使用 ResNet50 model 训练数据集并将批量大小降低到 1，但仍然遇到 OOM 问题（2080 Ti）。 我发现 OOM 是由太多参数引起的，因此我试图在一个非常基本的 10 层 model 上训练网络，批量大小为 1。它可以运行，但是速度/准确度是不出所料的糟糕透顶。

无论如何，我可以将训练集划分为更小的类，例如：

1st.h5 = 类 1 ~ 20,000

2nd.h5 = 类 20,001 ~ 40,000

3rd.h5 = 类 40,001 ~ 60,000 等。

然后合并成一个可以加载以识别所有 300k 不同类的单个 h5 文件？

编辑 ASHISH 的建议：

我（我认为）成功地将 2 个模型合并为一个，但是合并后的 model 的层数增加了一倍......

源代码：

model1 = load_model('001.h5')
model2 = load_model('002.h5')

for layer in model1.layers:
    layer._name = layer._name + "_1" # avoid duplicate layer names, which would otherwise throw an error
    layer.trainable = False

for layer in model2.layers:
    layer._name = layer._name + "_2"
    layer.trainable = False

x1 = model1.layers[-1].output
classes = x1.shape[1]
x1 = Dense(classes, activation='relu', name='out1')(x1)

x2 = model2.layers[-1].output
x2 = Dense(x2.shape[1], activation='relu', name='out2')(x2)
classes += x2.shape[1]

x = concatenate([x1, x2])
output_layer = Dense(classes, activation='softmax', name='combined_layer')(x)
new_model = Model(inputs=[model1.inputs, model2.inputs], outputs=output_layer)
new_model.summary()
new_model.save('new_model.h5', overwrite=True)

生成的 model 如下所示：

Model: "model"
_________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
=========================================================================
input_1_1 (InputLayer)          [(None, 224, 224, 3) 0                                            
_________________________________________________________________________
input_1_2 (InputLayer)          [(None, 224, 224, 3) 0                                            
_________________________________________________________________________
conv1_pad_1 (ZeroPadding2D)     (None, 230, 230, 3)  0           input_1_1[0][0]                  
_________________________________________________________________________
conv1_pad_2 (ZeroPadding2D)     (None, 230, 230, 3)  0           input_1_2[0][0]                  
_________________________________________________________________________
conv1_conv_1 (Conv2D)           (None, 112, 112, 64) 9472        conv1_pad_1[0][0]                
_________________________________________________________________________
conv1_conv_2 (Conv2D)           (None, 112, 112, 64) 9472        conv1_pad_2[0][0]                

...

...

conv5_block3_out_1 (Activation) (None, 7, 7, 2048)   0           conv5_block3_add_1[0][0]         
_________________________________________________________________________
conv5_block3_out_2 (Activation) (None, 7, 7, 2048)   0           conv5_block3_add_2[0][0]         
_________________________________________________________________________
avg_pool_1 (GlobalAveragePoolin (None, 2048)         0           conv5_block3_out_1[0][0]         
_________________________________________________________________________
avg_pool_2 (GlobalAveragePoolin (None, 2048)         0           conv5_block3_out_2[0][0]         
_________________________________________________________________________
probs_1 (Dense)                 (None, 953)          1952697     avg_pool_1[0][0]                 
_________________________________________________________________________
probs_2 (Dense)                 (None, 3891)         7972659     avg_pool_2[0][0]                 
_________________________________________________________________________
out1 (Dense)                    (None, 953)          909162      probs_1[0][0]                    
_________________________________________________________________________
out2 (Dense)                    (None, 3891)         15143772    probs_2[0][0]                    
_________________________________________________________________________
concatenate (Concatenate)       (None, 4844)         0           out1[0][0]                       
                                                                 out2[0][0]                       
_________________________________________________________________________
combined_layer (Dense)          (None, 4844)         23469180    concatenate[0][0]                
=========================================================================
Total params: 96,622,894
Trainable params: 39,522,114
Non-trainable params: 57,100,780

如您所见，由于 Model(inputs=[input1, input2])，所有层都翻了一番。 当我想使用这个 model 来预测图像时，这会给我带来问题。 无论如何我可以在不将所有先前层加倍而只添加尾随密集层的情况下做到这一点吗？ 以这种速度，我将比以前更快地加载参数数量......

Answer 1

technically it's possible, So what you can do is since you have 3 classifiers(1.h5,2.h5,3.h5), you can load these model with their weights and then use functional API in tensorflow https://www. tensorflow.org/guide/keras/functional where concatenate() API will combine output of the 3 classifiers to single vector and then use few dense network with activation function to make the final prediction.

训练具有超过 300k 类的图像分类器

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-04-16 12:57:28

训练具有超过 300k 类的图像分类器

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-04-16 12:57:28

解决方案1
0 已采纳 2020-04-16 12:57:28