简体   繁体   中英

ImageNet classification challenge: Achieving top-5 error of 0.99472 on test set using VGG11

I recently took an imagenet pre-trained VGG11 network and made predictions on the imagenet test dataset. Upon submitting this file to the evaluation server, I received an email with following text:

Error: 0.99607 (top-5)  0.99898 (top-1)
Per-class error (classes 1-1000):
1 1
1 1
1 1
...

Does this mean that my top-5 accuracy is 1-0.99607=0.393% ? If so then the score is too low.

Could you please point out where I could be going wrong? Here is the code for reference.

PS: I have checked that the images are loaded and predicted upon in alphabetical order.

vgg11 = models.vgg11(pretrained=True)
vgg11.to(torch.device("cuda"))
vgg11.eval()
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                 std=[0.229, 0.224, 0.225])


test_loader = torch.utils.data.DataLoader(datasets.ImageFolder("test_dataset",
                                                               transforms.Compose([
                                                                   transforms.Resize(256),
                                                                   transforms.CenterCrop(224),
                                                                   transforms.ToTensor(),
                                                                   normalize
                                                                   ])),
                                          batch_size=32, shuffle=False)
fp = open("predictions.txt", "w")
for a, b in tqdm(test_loader):
    preds = vgg11(a.cuda())
    _, preds = torch.topk(preds, k=5, dim=1)
    preds = preds.cpu().detach().numpy()
    for i in range(len(preds)):
        fp.write(" ".join(str(j) for j in preds[i])+"\n")
fp.close()

Based on your code, I believe the error is right because of the lack of normalization. I don't have the environment to test on the ImageNet test set, so I made a small example with 4 random cat images from the internet. (Link: image1 , image2 , image3 , image4 ).

The code test as below:

import torch
from torchvision import models
import numpy as np
import cv2
import os

with torch.no_grad():
    vgg11 = models.vgg11(pretrained=True)
    vgg11.eval()
        
    mean=torch.tensor([0.485, 0.456, 0.406])
    std=torch.tensor([0.229, 0.224, 0.225])

    def read_image(image_path, size=224):
        image = cv2.imread(image_path)
        image = cv2.resize(image, (size,size))
        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB).astype(np.float32) 
        image = torch.tensor(image).permute(2,0,1).unsqueeze(0) / 255.
        image = (image - mean[None, :, None, None])/std[None, :, None, None]
        return image

    from_path = './../test_image/'
    cat_name = ['cat1','cat2','cat3','cat4']

    images = torch.empty(0, 3, 224, 224)

    for name in cat_name:
        image_path = os.path.join(from_path, f'{name}.png')
        image = read_image(image_path)
        images = torch.cat((images, image), 0)
        

    preds = vgg11(images.float()).detach().cpu().numpy()
    result = np.argmax(preds, axis=1)
    print(result)

Without normalization, the result is ['Egyptian cat', 'sock', 'Komodo dragon', 'doormat'] ([285, 806, 48, 539]).

With normalization, the result is ['tabby cat', 'tabby cat', 'leopard', 'Egyptian cat'] ([281 281 288 285]).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM