Caffe模型無法學習

Question

我在Keras中實現了以下卷積模型，在訓練了100,000個紀元后，它以非常高的精度顯示了出色的性能。

img_rows, img_cols = 24, 15
input_shape = (img_rows, img_cols, 1)
nb_filters = 32
pool_size = (2, 2)
kernel_size = (3, 3)

model = Sequential()
model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1],
                        border_mode='valid',
                        input_shape=input_shape))
model.add(Activation('relu'))
model.add(Convolution2D(nb_filters, kernel_size[0], kernel_size[1]))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=pool_size))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(128))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(nb_classes))
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy',
              optimizer='adadelta',
              metrics=['accuracy'])

但是，在嘗試在Caffe中實現相同的模型后，它無法以> = 2.1 && <= 2.6的幾乎固定的損失值進行訓練。 這是我的Caffe原型實現：

name: "FneishNet"
layer {
  name: "inlayer1"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  data_param {
    source: "examples/fneishnet_numbers/fneishnet_numbers_train_lmdb"
    batch_size: 128
    backend: LMDB
  }
}
layer {
  name: "inlayer1"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TEST
  }
  data_param {
    source: "examples/fneishnet_numbers/fneishnet_numbers_val_lmdb"
    batch_size: 64
    backend: LMDB
  }
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 32
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "conv1"
  top: "conv1"
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "conv1"
  top: "conv2"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
  }
  convolution_param {
    num_output: 32
    kernel_size: 3
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "relu2"
  type: "ReLU"
  bottom: "conv2"
  top: "conv2"
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv2"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 1
  }
}
layer {
  name: "drop1"
  type: "Dropout"
  bottom: "pool1"
  top: "pool1"
  dropout_param {
    dropout_ratio: 0.25
  }
}
layer {
  name: "flatten1"
  type: "Flatten"
  bottom: "pool1"
  top: "flatten1"
}
layer {
  name: "fc1"
  type: "InnerProduct"
  bottom: "flatten1"
  top: "fc1"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 128
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "relu3"
  type: "ReLU"
  bottom: "fc1"
  top: "fc1"
}
layer {
  name: "drop2"
  type: "Dropout"
  bottom: "fc1"
  top: "fc1"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layer {
  name: "fc2"
  type: "InnerProduct"
  bottom: "fc1"
  top: "fc2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 11
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "accuracy"
  type: "Accuracy"
  bottom: "fc2"
  bottom: "label"
  top: "accuracy"
  include {
    phase: TEST
  }
}
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "fc2"
  bottom: "label"
  top: "loss"
}

這是我的模型求解器（超參數）：

net: "models/fneishnet_numbers/train_val.prototxt"
test_iter: 1000
test_interval: 4000
test_initialization: false
display: 40
average_loss: 40
base_lr: 0.01
gamma: 0.1
lr_policy: "poly"
power: 0.5
max_iter: 3000000
momentum: 0.9
weight_decay: 0.0005
snapshot: 100000
snapshot_prefix: "models/fneishnet_numbers/fneishnet_numbers_quick"
solver_mode: CPU

我相信，如果我將模型轉換為Caffe沒問題，那么它的執行方式應該與Keras中的執行方式相同，所以我認為我錯過了一些東西。 任何幫助，將不勝感激，謝謝。

Answer 1

poly：有效學習率遵循多項式衰減，由max_iter //變為零。 返回base_lr（1-iter / max_iter）^（冪）

因此，基本上，您確定要在返回值base_lr（1-iter / max_iter）^（power）中將功率設置為0.5嗎？ 我認為這可能是問題所在，因為您正在衰減減去某些東西，嘗試2？

Caffe模型無法學習

問題描述

1 個解決方案

解決方案1
0 已采納 2018-10-01 17:27:00

Caffe模型無法學習

問題描述

1 個解決方案

解決方案1 0 已采納 2018-10-01 17:27:00

解決方案1
0 已采納 2018-10-01 17:27:00