简体   繁体   中英

How to read files in a folder within a zipped folder in Python

I have a zipped folder with me which contains a subfolder within it and the subfolder has around 60000+ images within it. I was wondering if there is a way to read all the images within the subfolder without extracting it (As the size of the image folders is ~ 100GB).

I was thinking of using zipfile package within python.However I will not be able to use open function within the module since I don't know how to iterate through the whole sub-folder. It will be great if you could kindly provide me any inputs on how to do this

with zipfile.ZipFile("/home/diliptmonson/abc.zip","r") as zip_ref:
    train_images=zip_ref.open('train/86760c00-21bc-11ea-a13a-137349068a90.jpg')```

You may use the following solution:

  • Open the zip file, and iterate the content as described here .
  • Verify file extension is .jpg .
  • Read image binary data of specific element (file within folder) from zip.
  • Decode the binary data to image using cv2.imdecode .

Here is the code:

from zipfile import ZipFile
import numpy as np
import cv2
import os

# https://thispointer.com/python-how-to-get-the-list-of-all-files-in-a-zip-archive/
with ZipFile("abc.zip", "r") as zip_ref:
   # Get list of files names in zip
   list_of_files = zip_ref.namelist()

   # Iterate over the list of file names in given list & print them
   for elem in list_of_files:
       #print(elem)
       ext = os.path.splitext(elem)[-1]  # Get extension of elem

       if ext == ".jpg":
           # Read data in case extension is ".jpg"
           in_bytes = zip_ref.read(elem)

           # Decode bytes to image.
           img = cv2.imdecode(np.fromstring(in_bytes, np.uint8), cv2.IMREAD_COLOR)

           # Show image for testing
           cv2.imshow('img', img)
           cv2.waitKey(1000)

cv2.destroyAllWindows()

Use a for-loop:

# namelist lists all files
for file in zip_ref.namelist():
   opened_file = zip_ref.open(file)
   # do stuff with your file 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM