下载 MNIST 数据集时出现“HTTP 错误 403：禁止”错误

Question

I use following code to get the MNIST dataset:我使用以下代码来获取 MNIST 数据集：

import torchvision.datasets
MNIST_train = torchvision.datasets.MNIST('./', download=True, train=True)

This code worked times ago, but now it shows the error:这段代码以前工作过，但现在它显示了错误：

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./MNIST\raw\train-images-idx3-ubyte.gz
HTTP Error 403: Forbidden
Stack trace:
 >  File "C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\Lib\urllib\request.py", line 650, in http_error_default
 >    raise HTTPError(req.full_url, code, msg, hdrs, fp)

Answer 1

Using the suggestion mentioned here , adding this to the top of my script worked:使用这里提到的建议，将其添加到我的脚本顶部有效：

from six.moves import urllib    
opener = urllib.request.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
urllib.request.install_opener(opener)

Answer 2

Seems you may have to add a header to your urllib request (due to that site moving to Cloudflare protection)似乎您可能需要向 urllib 请求添加标头（由于该站点转向 Cloudflare 保护）

Eg.例如。

opener = urllib.request.URLopener()
opener.addheader('User-Agent', some_user_agent)
opener.retrieve(
    url, fpath,
    reporthook=gen_bar_updater()
)

This problem is mentioned in a github forum for pytorch here as well, with a few solutions for the issue.这个问题是在github上论坛pytorch提到这里为好，与这一问题的几种解决方案。

One of the more complete Python3 solutions given there is as follows:给出的更完整的 Python3 解决方案之一如下：

from torchvision import datasets
import torchvision.transforms as transforms
import urllib

num_workers = 0
batch_size = 20
basepath = 'some/base/path'
transform = transforms.ToTensor()

def set_header_for(url, filename):
    opener = urllib.request.URLopener()
    opener.addheader('User-Agent', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36')
    opener.retrieve(
    url, f'{basepath}/{filename}')

set_header_for('http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz', 'train-images-idx3-ubyte.gz')
set_header_for('http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz', 'train-labels-idx1-ubyte.gz')
set_header_for('http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz', 't10k-images-idx3-ubyte.gz')
set_header_for('http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz', 't10k-labels-idx1-ubyte.gz')
train_data = datasets.MNIST(root='data', train=True,
                                   download=True, transform=transform)
test_data = datasets.MNIST(root='data', train=False,
                                  download=False, transform=transform)

They add the header for each of the retrievals using a function, simplifying the process.他们使用函数为每个检索添加标题，从而简化了过程。

Answer 3

I looked it up and the problem is that the folder has moved under CloudFlare protection as one of the commentors mentions here: https://github.com/pytorch/vision/issues/1938 .我查了一下，问题是该文件夹已在 CloudFlare 保护下移动，正如这里的评论者之一所提到的： https : //github.com/pytorch/vision/issues/1938 。

It is also explained how to solve/fix this issue by adding headers there.还解释了如何通过在那里添加标题来解决/修复此问题。 I hope it helps.我希望它有帮助。

下载 MNIST 数据集时出现“HTTP 错误 403：禁止”错误

问题描述

3 个解决方案

解决方案1
13 2021-03-03 16:29:27

解决方案2
4 2020-03-05 14:54:59

解决方案3
2 2020-03-05 14:55:09

下载 MNIST 数据集时出现“HTTP 错误 403：禁止”错误

问题描述

3 个解决方案

解决方案1 13 2021-03-03 16:29:27

解决方案2 4 2020-03-05 14:54:59

解决方案3 2 2020-03-05 14:55:09

解决方案1
13 2021-03-03 16:29:27

解决方案2
4 2020-03-05 14:54:59

解决方案3
2 2020-03-05 14:55:09