简体   繁体   中英

How to read contents of zip file in memory on a file upload in python?

I have a zip file that I receive when the user uploads a file. The zip essentially contains a json file which I want to read and process without having to create the zip file first, then unzipping it and then reading the content of the inner file.

Currently I only the longer process which is something like below

import json
import zipfile

@csrf_exempt
def get_zip(request):
    try:
        if request.method == "POST":
            try:
                client_file = request.FILES['file']
                file_path = "/some/path/"
                # first dump the zip file to a directory
                with open(file_path + '%s' % client_file.name, 'wb+') as dest:
                        for chunk in client_file.chunks():
                            dest.write(chunk)

                # unzip the zip file to the same directory 
                with zipfile.ZipFile(file_path + client_file.name, 'r') as zip_ref:
                    zip_ref.extractall(file_path)

                # at this point we get a json file from the zip say `test.json`
                # read the json file content
                with open(file_path + "test.json", "r") as fo:
                    json_content = json.load(fo)
                    doSomething(json_content)
                return HttpResponse(0)

            except Exception as e:
                return HttpResponse(1)

As you can see, this involves 3 steps to finally get the content from the zip file into memory. What I want is get the content of the zip file and load directly into memory.

I did find some similar questions in stack overflow like this one https://stackoverflow.com/a/2463819 . But I am not sure at what point do I invoke this operation mentioned in the post

How can I achieve this?

Note: I am using django in backend. There will always be one json file in the zip.

The first argument to zipfile.ZipFile() can be a file object rather than a pathname. I think the Django UploadedFile object supports this use, so you can read directly from that rather than having to copy into a file.

You can also open the file directly from the zip archive rather than extracting that into a file.

import json
import zipfile

@csrf_exempt
def get_zip(request):
    try:
        if request.method == "POST":
            try:
                client_file = request.FILES['file']
                # unzip the zip file to the same directory 
                with zipfile.ZipFile(client_file, 'r') as zip_ref:
                    first = zip_ref.infolist()[0]
                    with zip_ref.open(first, "r") as fo:
                        json_content = json.load(fo)
                doSomething(json_content)
                return HttpResponse(0)

            except Exception as e:
                return HttpResponse(1)

From what I understand, what @jason is trying to say here is to first open a zipFile just like you have done here with zipfile.ZipFile(file_path + client_file.name, 'r') as zip_ref: .

class zipfile.ZipFile(file[, mode[, compression[, allowZip64]]])

  Open a ZIP file, where file can be either a path to a file (a string) or a file-like object.

And then use BytesIO read in the bytes of a file-like object. But from above you are reading in r mode and not rb mode. So change it as follows.

with open(filename, 'rb') as file_data:
    bytes_content = file_data.read()
    file_like_object = io.BytesIO(bytes_content)
    zipfile_ob = zipfile.ZipFile(file_like_object)

Now zipfile_ob can be accessed from memory.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM