在復制紙張結果時Keras VGG16的精度差

Question

我正在嘗試重現論文的一些結果，描述Grad-CAM方法，將Keras與Tensorflow-GPU后端配合使用，並獲得完全不正確的標簽。

我從該論文中捕獲了圖1（a）的屏幕截圖，並試圖從Keras Applications中進行預訓練的VGG16進行分類。

這是我的圖像：

這是我的代碼（來自Jupyter筆記本的單元格）。 部分代碼是從Keras手冊中復制的

import imageio
from matplotlib import pyplot as plt
from skimage.transform import resize

from keras import activations
from keras.applications import VGG16
from keras.applications.vgg16 import preprocess_input, decode_predictions

# Build the VGG16 network with ImageNet weights
model = VGG16(weights='imagenet', include_top=True)

%matplotlib inline

dog_img = imageio.imread(r"F:\tmp\Opera Snapshot_2018-09-24_133452_arxiv.org.png")
dog_img = dog_img[:, :, 0:3]   # Opera has added alpha channel
dog_img = resize(dog_img, (224, 224, 3))

x = np.expand_dims(dog_img, axis=0)
x = preprocess_input(x, mode='tf')

pred = model.predict(x)
decode_predictions(pred)

輸出：

[[('n03788365', 'mosquito_net', 0.017053505),
  ('n03291819', 'envelope', 0.015034639),
  ('n15075141', 'toilet_tissue', 0.012603286),
  ('n01737021', 'water_snake', 0.010620943),
  ('n04209239', 'shower_curtain', 0.009625845)]]

但是，當我向論文作者http://gradcam.cloudcv.org/classification運行的在線服務提交同一圖像時，我看到正確的標簽“ Boxer”

這是他們稱為“終端”的輸出：

Completed the Classification Task

"Time taken for inference in torch: 9.0"
"Total time taken: 9.12565684319"
{"classify_gcam": "./media/grad_cam/classification/86560f84-bfe5-11e8-a657-22000b4a9274/classify_gcam_243.png", "execution_time": 9.0, "label": 243.0, "classify_gb_gcam": "./media/grad_cam/classification/86560f84-bfe5-11e8-a657-22000b4a9274/classify_gb_gcam_243.png", "classify_gcam_raw": "./media/grad_cam/classification/86560f84-bfe5-11e8-a657-22000b4a9274/classify_gcam_raw_243.png", "input_image": "./media/grad_cam/classification/86560f84-bfe5-11e8-a657-22000b4a9274/Opera Snapshot_2018-09-24_133452_arxiv.org.png", "pred_label": 243.0, "classify_gb": "./media/grad_cam/classification/86560f84-bfe5-11e8-a657-22000b4a9274/classify_gb_243.png"}
Completed the Classification Task

"Time taken for inference in torch: 9.0"
"Total time taken: 9.05940508842"
{"classify_gcam": "./media/grad_cam/classification/86560f84-bfe5-11e8-a657-22000b4a9274/classify_gcam_243.png", "execution_time": 9.0, "label": 243.0, "classify_gb_gcam": "./media/grad_cam/classification/86560f84-bfe5-11e8-a657-22000b4a9274/classify_gb_gcam_243.png", "classify_gcam_raw": "./media/grad_cam/classification/86560f84-bfe5-11e8-a657-22000b4a9274/classify_gcam_raw_243.png", "input_image": "./media/grad_cam/classification/86560f84-bfe5-11e8-a657-22000b4a9274/Opera Snapshot_2018-09-24_133452_arxiv.org.png", "pred_label": 243.0, "classify_gb": "./media/grad_cam/classification/86560f84-bfe5-11e8-a657-22000b4a9274/classify_gb_243.png"}
Job published successfully
Publishing job to Classification Queue
Starting classification job on VGG_ILSVRC_16_layers.caffemodel
Job published successfully
Publishing job to Classification Queue
Starting classification job on VGG_ILSVRC_16_layers.caffemodel

我在Windows 7上使用Anaconda Python 64位。

我的PC上相關軟件的版本：

keras                     2.2.2                         0
keras-applications        1.0.4                    py36_1
keras-base                2.2.2                    py36_0
keras-preprocessing       1.0.2                    py36_1
tensorflow                1.10.0          eigen_py36h849fbd8_0
tensorflow-base           1.10.0          eigen_py36h45df0d8_0

我究竟做錯了什么？ 我如何獲得拳擊手標簽？

Answer 1

您顯然無法執行以下行

dog_img = dog_img[:, :, 0:3]   # Opera has added alpha channel

因此，我使用load_img的實用程序load_img加載了圖像，該實用程序未添加Alpha通道。

完整的代碼

import imageio
from matplotlib import pyplot as plt
from skimage.transform import resize
import numpy as np
from keras import activations
from keras.applications import VGG16
from keras.preprocessing import image
from keras.applications.vgg16 import preprocess_input, decode_predictions

# Build the VGG16 network with ImageNet weights
model = VGG16(weights='imagenet', include_top=True)
dog_img = image.img_to_array(image.load_img(r"F:\tmp\Opera Snapshot_2018-09-24_133452_arxiv.org.png", target_size=(224, 224)))

x = np.expand_dims(dog_img, axis=0)
x = preprocess_input(x)

pred = model.predict(x)
print(decode_predictions(pred))

[[('n02108089', 'boxer', 0.29122102), ('n02108422', 'bull_mastiff', 0.199128), ('n02129604', 'tiger', 0.10050287), ('n02123159', 'tiger_cat', 0.09733449), ('n02109047', 'Great_Dane', 0.056869864)]]

Answer 2

考慮到所有輸出概率都非常低，並且大致均等地分布在0.01左右，我的猜測是您對圖像進行了不正確的預處理，並將某種看起來像噪聲的加擾圖像傳遞給model.predict() 。 在您predict()之前，嘗試調試和imshow圖像。

在復制紙張結果時Keras VGG16的精度差

問題描述

2 個解決方案

解決方案1
1 已采納 2018-09-24 16:32:13

解決方案2
1 2018-09-24 16:38:20

在復制紙張結果時Keras VGG16的精度差

問題描述

2 個解決方案

解決方案1 1 已采納 2018-09-24 16:32:13

解決方案2 1 2018-09-24 16:38:20

解決方案1
1 已采納 2018-09-24 16:32:13

解決方案2
1 2018-09-24 16:38:20