简体   繁体   中英

Can't seem to read a tar.gz file correctly into Python

I've downloaded a tar.gz file from this site:

http://www.vision.caltech.edu/Image_Datasets/Caltech101/

It's supposed to contain many images. Ideally, I would like to read all the images into a huge np.array in their original dimensions.

Here is one of my attempts:

import tarfile
import numpy as np 


images = []

with tarfile.open(file, "r:gz") as tar:

    for member in tar.getmembers()[:10]:
         if  member.isfile():
              file=tar.extractfile(member)
              image.append(file.read())

Now file.read() returns class 'bytes' ; not sure how to read that into an numpy array.

I've tried

np.array(file.read())  # ValueError: embedded null byte
np.fromfile(file)   # AttributeError: '_FileInFile' object has no attribute 'fileno'

You could try NP.fromstring :

NP.fromstring(file.read(), dtype=NP.uint8)

If you want the bytes encoded as 8 bit unsigned integers. You can change the dtype if you want something else.

Edit: I changed 32 bit to 8 bit.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM