简体   繁体   中英

How to unzip without writing to disk?

I have a zip archive path_to_zip_file in a read-only system. The tricky thing is that I need to unzip its content and open a CSV file testfile.csv that is included in the zip archive. Please notice that the zip archive includes many different files, but I only want to take a CSV file from it. My goal is to get the content of this CSV file into pandas dataframe df .

My code is shown below. Is there any way to update it in such a way that it can be executed in a read-only system? In other words, how can I run it in memory without writing to disk?

import zipfile
import pandas as pd

path_to_zip_file = "data/test.zip"
directory_to_extract_to = "result"
with zipfile.ZipFile(path_to_zip_file, 'r') as zip_ref:
    zip_ref.extractall(directory_to_extract_to)

csv_file_name = "testfile.csv"
df = pd.read_csv("{}/{}".format(directory_to_extract_to,csv_file_name), index_col=False)

Easy way to do it is to extract it to /tmp, which is a directory in RAM. You could also use python's tempfile library to create a temporary directory and extract it there (it will probably just create a directory in /tmp)

Using ZipFile.open on the already opened archive, we can do just that:

import zipfile
import pandas as pd

with zipfile.ZipFile("archive.zip") as archive:
    with archive.open("testing.txt") as csv:
        df = pd.read_csv(csv)

print(df)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM