简体   繁体   中英

ImageNet dataset images not loading properly

I see that we can't download ImageNet dataset from Pytorch directly now. I get this error:

RuntimeError: The dataset is no longer publicly accessible. You need to download the archives externally and place them in the root directory.

So I went on the website and downloaded the 32X32 images (why is to so slow to download?). So it downloaded the training data in batches and when I loaded one of them and see how the images look like, I get this: 在此处输入图像描述

Here's how I loaded the image:

file_1 = np.load("imagenet/Imagenet32_train_npz/train_data_batch_1.npz")
img = file_1['data'][0]
img = np.reshape(img, (32,32,3))
plt.imshow(img)
plt.show()

Am I doing something wrong or ImageNet is only changed? Let me know.

I've faced the same problem, and I understood that imagenet Data is channel first which means instead of reshaping it into (32, 32, 3) you should reshape it into (3, 32, 32) and then transpose it the full code would look like this:

file_1 = np.load("yourpath" , allow_pickle=True)
images = file_1["data"].reshape(-1  , 3 , 32 , 32)
images = images.transpose(0 , -2 , -1 , 1)

Image Gate. In 2019, after discovering inaccuracies affecting data quality, ImageNet as well as 80 Million Tiny Images,have taken themselves offline and are in for "repair". They suggest folks avoid using them until they can fix this. what went wrong The image set was built without anyone (human) actually looking at the data. While it sounds like its an achievement for AI, it turns out people(egtrolls) put all kinds of garbage in the "alt-text" part of an image, which was used as input to automatically build categories/index building. They (most famously Fei Fei Li) also outsourced the task to Mechanical Turk*. The authors describe this dataset as "crowd sourced" but that is completely misleading and wrong. Since there was no human vetting or inspections of data, these troll labels (eg n r, c t, etc.)went into the data, as is. And showed up later to great harm for thousands of applications... Most of the trouble (so far) came from the categories involving "people", approx 3k categories. What's going on right now? Once the inventors got wind of the harm, they got money from NSF to fix the problem (2018/2019 - now)... Of these categories, approx over half the images have been removed. However, there is still no vetting to determine if what the alt-text says matches what is indeed in the image. These comments relate only to the people categories. I have heard nothing about errors in the other categories.

There is great concern about this issue because these image db are widespread at this point. The issue affects not just face rec. Its been downplayed to avoid user/programmer panic, but people have a right to know.

*Some of you might know mechanical turk from the FB/Cambridge Analytica stories.

Here are some references. Dig around yourself and find out more.... https://gizmodo.com/mit-takes-down-popular-ai-dataset-due-to-racist-misogy-1844244206 https://image-net.org/update-sep-17-2019.phphttps://internetpolicy.mit.edu/blog-2018-fb-cambridgeanalytica/ Gorey, Colm (2020-07-13). "80m images used to train AI pulled after researchers find string of racist terms". Silicon Republic. Retrieved 2021-11-15.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM