Can't seem to read a tar.gz file correctly into Python

Question

I've downloaded a tar.gz file from this site:

http://www.vision.caltech.edu/Image_Datasets/Caltech101/

It's supposed to contain many images. Ideally, I would like to read all the images into a huge np.array in their original dimensions.

Here is one of my attempts:

import tarfile
import numpy as np 


images = []

with tarfile.open(file, "r:gz") as tar:

    for member in tar.getmembers()[:10]:
         if  member.isfile():
              file=tar.extractfile(member)
              image.append(file.read())

Now file.read() returns class 'bytes' ; not sure how to read that into an numpy array.

I've tried

np.array(file.read())  # ValueError: embedded null byte
np.fromfile(file)   # AttributeError: '_FileInFile' object has no attribute 'fileno'

Answer 1

You could try NP.fromstring :

NP.fromstring(file.read(), dtype=NP.uint8)

If you want the bytes encoded as 8 bit unsigned integers. You can change the dtype if you want something else.

Edit: I changed 32 bit to 8 bit.

Can't seem to read a tar.gz file correctly into Python

Question

1 answers

solution1
0 2017-09-23 23:16:36

Can't seem to read a tar.gz file correctly into Python

Question

1 answers

solution1 0 2017-09-23 23:16:36

solution1
0 2017-09-23 23:16:36