简体   繁体   English

HTTP 尝试下载 MNIST 数据时出错

[英]HTTP Error when trying to download MNIST data

I am using Google Colab for training a LeNet-300-100 fully-connected neural network on MNIST using Python3 and PyTorch 1.8.我正在使用 Google Colab 使用 Python3 和 PyTorch 1.8 在 MNIST 上训练 LeNet-300-100 全连接神经网络。

To apply the transformations and download the MNIST dataset, the following code is being used:为了应用转换并下载 MNIST 数据集,正在使用以下代码:

# MNIST dataset statistics:
# mean = tensor([0.1307]) & std dev = tensor([0.3081])
mean = np.array([0.1307])
std_dev = np.array([0.3081])

transforms_apply = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean = mean, std = std_dev)
    ])

which gives the error:这给出了错误:

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to./data/MNIST/raw/train-images-idx3-ubyte.gz --------------------------------------------------------------------------- HTTPError Traceback (most recent call last) in () 2 train_dataset = torchvision.datasets.MNIST( 3 root = './data', train = True, ----> 4 transform = transforms_apply, download = True 5 ) 6下载http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz到./data/MNIST/raw/train-images-idx3-ubyte.gz -------- -------------------------------------------------- ----------------- HTTPError Traceback (last last call last) in () 2 train_dataset = torchvision.datasets.MNIST( 3 root = './data', train = True, ----> 4 变换 = transforms_apply,下载 = True 5 ) 6

11 frames /usr/lib/python3.7/urllib/request.py in http_error_default(self, req, fp, code, msg, hdrs) 647 class HTTPDefaultErrorHandler(BaseHandler): 648 def http_error_default(self, req, fp, code, msg, hdrs): --> 649 raise HTTPError(req.full_url, code, msg, hdrs, fp) 650 651 class HTTPRedirectHandler(BaseHandler): 11 帧 /usr/lib/python3.7/urllib/request.py 在 http_error_default(self, req, fp, code, msg, hdrs) 647 class HTTPDefaultErrorHandler(BaseHandler): 648 def http_error_default(self, req, fp, code, msg, hdrs): --> 649 raise HTTPError(req.full_url, code, msg, hdrs, fp) 650 651 class HTTPRedirectHandler(BaseHandler):

HTTPError: HTTP Error 503: Service Unavailable HTTPError:HTTP 错误 503:服务不可用

What's wrong?怎么了?

I was having the same 503 error and this worked for me我遇到了同样的 503 错误,这对我有用

!wget www.di.ens.fr/~lelarge/MNIST.tar.gz
!tar -zxvf MNIST.tar.gz

from torchvision.datasets import MNIST
from torchvision import transforms

train_set = MNIST('./', download=True,
transform=transforms.Compose([
transforms.ToTensor(),
]), train=True)


test_set = MNIST('./', download=True,
transform=transforms.Compose([
transforms.ToTensor(),
]), train=False)

There has been a lot of trouble with the MNIST hosted on http://yann.lecun.com/exdb/mnist/ therefore pytorch got permission and hosting it now on amazon aws.托管在http://yann.lecun.com/exdb/mnist/上的 MNIST 存在很多问题,因此 pytorch 获得了许可,现在将其托管在亚马逊 AWS 上。

Unfortunately, the fix is only available in the nightly build ( Here you can find the fixed code. )不幸的是,该修复仅在夜间构建中可用( 在这里您可以找到修复的代码。)

A hot fix I found useful is:我发现有用的热修复是:

from torchvision import datasets
new_mirror = 'https://ossci-datasets.s3.amazonaws.com/mnist'
datasets.MNIST.resources = [
   ('/'.join([new_mirror, url.split('/')[-1]]), md5)
   for url, md5 in datasets.MNIST.resources
]
train_dataset = datasets.MNIST(
   "../data", train=True, download=True, transform=transform
)

Update : According to torch vision issue 3549 this will be fixed in the next minor release更新:根据torch vision issue 3549 ,这将在下一个小版本中修复

This problem has been solved in torchvision==0.9.1 according to this .这个问题已经在torchvision==0.9.1中根据this解决了。 As a temporary solution, please use the following workaround:作为临时解决方案,请使用以下解决方法:

from torchvision import datasets, transforms
datasets.MNIST.resources = [
    ('https://ossci-datasets.s3.amazonaws.com/mnist/train-images-idx3-ubyte.gz', 'f68b3c2dcbeaaa9fbdd348bbdeb94873'),
    ('https://ossci-datasets.s3.amazonaws.com/mnist/train-labels-idx1-ubyte.gz', 'd53e105ee54ea40749a09fcbcd1e9432'),
    ('https://ossci-datasets.s3.amazonaws.com/mnist/t10k-images-idx3-ubyte.gz', '9fb629c4189551a2d022fa330f9573f3'),
    ('https://ossci-datasets.s3.amazonaws.com/mnist/t10k-labels-idx1-ubyte.gz', 'ec29112dd5afa0611ce80d1b7f02629c')
]

# AND the rest of your code as usual for train and test (EXAMPLE):
batch_sz = 100
tr_ = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))])
# MNIST
train_dataset = datasets.MNIST(
    root='./dataset', 
    train=True, 
    transform=tr_,  
    download=True
)

test_dataset = datasets.MNIST(
    root='./dataset', 
    train=False, 
    transform=tr_  
)
# DataLoader
train_loader = torch.utils.data.DataLoader(
    dataset=train_dataset,
    batch_size=batch_sz,
    shuffle=True 
)

test_loader = torch.utils.data.DataLoader(
    dataset=test_dataset,
    batch_size=batch_sz,
    shuffle=False 
)

you can try this:你可以试试这个:

from sklearn.datasets import fetch_openml
mnist = fetch_openml('mnist_784', data_home=".")

x = mnist.data
x = x.reshape((-1, 28, 28))
x = x.astype('float32')

y = mnist.target
y = y.astype('float32')
import tensorflow as tf
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

use that用那个

You did nothing wrong.你没有做错什么。 It is problem of the platform where data is hosted.这是托管数据的平台的问题。 Using Pytorch you can download MNIST using below code使用 Pytorch 您可以使用以下代码下载 MNIST

import torch
import torchvision
from torchvision.datasets import MNIST

# Download training dataset
dataset = MNIST(root='data/', download=True)

The above MNIST wrapper in Pytorch datasets would try many possible places where data is available. Pytorch 数据集中的上述 MNIST 包装器将尝试许多可能的数据可用位置。 After running the code you can see that first it tries to download from Yan Le Cun site but fails to download from there and fall back to other possible options.运行代码后,您可以看到它首先尝试从 Yan Le Cun 站点下载,但无法从那里下载并退回到其他可能的选项。

Potential Cause: The Yan LeCun Site is missing an updated SSL certificate, so some methods of downloading files do consider about this security measure and some doesn't.潜在原因:严乐村站点缺少更新的 SSL 证书,因此某些下载文件的方法确实考虑了此安全措施,而有些则没有。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM