简体   繁体   English

从 sklearn 数据集下载 MNIST 数据会出现超时错误

[英]MNIST data download from sklearn datasets gives Timeout error

I am new to ML and trying to download MNIST data.我是 ML 的新手,正在尝试下载 MNIST 数据。 The code I am using is:我正在使用的代码是:

from sklearn.datasets import fetch_mldata
mnist = fetch_mldata('MNIST original')

But, it gives an error saying:但是,它给出了一个错误说:

TimeoutError: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond

Can anyone please help me what needs to be done to rectify this issue?任何人都可以帮我解决这个问题需要做些什么吗?

here is the issue and some workaround good people suggested :这是问题和一些好人建议的解决方法:

https://github.com/scikit-learn/scikit-learn/issues/8588 https://github.com/scikit-learn/scikit-learn/issues/8588

easiest one was to download .mat file of MNIST with this download link:最简单的方法是使用以下下载链接下载 MNIST 的 .mat 文件:

download MNIST.mat 下载 MNIST.mat

after download put the file inside ~/scikit_learn_data/mldata folder, if this folder doesn't exist create it and put the Mnist.mat inside it.下载后将文件放入 ~/scikit_learn_data/mldata 文件夹中,如果此文件夹不存在,则创建它并将 Mnist.mat 放入其中。 when you have them locally scikit learn won't download it and uses that file.当您在本地拥有它们时,scikit learn 不会下载它并使用该文件。

Since fetch_mldata had been deprecated, we will have to move to fetch_openml.由于 fetch_mldata 已被弃用,我们将不得不转移到 fetch_openml。 Make sure to update your scikit-learn to version 0.20.0 or up in order to get the openml work.确保将您的 scikit-learn 更新到 0.20.0 或更高版本,以便openml工作。

  1. openml currently has 5 different datasets related to MNIST dataset. openml目前有 5 个与 MNIST 数据集相关的不同数据集。 Here is one example from sklearn's document using the mnist-784 dataset.这是sklearn 文档中使用 mnist-784 数据集的一个示例。
from sklearn.datasets import fetch_openml
# Load data from https://www.openml.org/d/554
X, y = fetch_openml('mnist_784', version=1, return_X_y=True)
  1. Or if you don't need a very large dataset, you can use load_digits :或者,如果您不需要非常大的数据集,则可以使用load_digits
from sklearn.datasets  import load_digits
mnist = load_digits()

Note that if you are following the book Hands-On Machine Learning with Scikit-Learn and TensorFlow , with mnist-784 dataset, you may notice that the code请注意,如果您正在阅读使用 Scikit-Learn 和 TensorFlow 的 Hands-On Machine Learning一书,使用mnist-784数据集,您可能会注意到代码

some_digit = X[36000]
some_digit_image = some_digit.reshape(28, 28)
plt.imshow(some_digit_image, cmap=matplotlib.cm.binary, interpolation="nearest")
plt.axis('off')
plt.show()

returns a picture of 9 instead of 5. I guess, it could either be that the mnist-784 and the mnist original are two subsets of the nist data, or the order of data is different between the two datasets.返回 9 而不是 5 的图片。我猜,可能是 mnist-784 和 mnist 原始是 nist 数据的两个子集,或者两个数据集之间的数据顺序不同。

PS: I had encountered some error about ssl when I was trying to load data, in my case I update openssl and the problem had been resolved. PS:我在尝试加载数据时遇到了一些关于 ssl 的错误,在我的情况下,我更新了 openssl 并且问题已经解决。

Though I am not sure about the reason you're getting the error, you can try below possible ways to rectify the same.虽然我不确定您收到错误的原因,但您可以尝试以下可能的方法来纠正错误。

  1. Sometimes, data can be corrupted in the time of the first download.有时,数据可能会在第一次下载时损坏。 And in that case, you need to clear the cache which you can remove from the scikit data home dir .在这种情况下,您需要清除可以从 scikit 数据主目录中删除的缓存 To get this directory, you can use -要获取此目录,您可以使用 -

     from sklearn.datasets.base import get_data_home print (get_data_home())

Now clean the directory, and redownload.现在清理目录,然后重新下载。

  1. And if the problem persists still, you can refer following links to do some trial-error to check your issue.如果问题仍然存在,您可以参考以下链接进行一些试错以检查您的问题。

https://github.com/ageron/handson-ml/issues/143 https://github.com/ageron/handson-ml/issues/143

https://github.com/scikit-learn/scikit-learn/issues/8588 https://github.com/scikit-learn/scikit-learn/issues/8588

https://github.com/ageron/handson-ml/issues/8 https://github.com/ageron/handson-ml/issues/8

And if you face the problem still, I would like to request you to provide the detail traceback to help me identify the problem.如果您仍然遇到问题,我想请求您提供详细的回溯以帮助我确定问题。

Thanks!!谢谢!!

If your sklearn version is less than .19 then "fetch_mldata" will not work.如果您的 sklearn 版本低于 .19,则“fetch_mldata”将不起作用。 You need to upgrade sklearn to version .23您需要将 sklearn 升级到 .23 版本

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM