使用目录中的多个 ZIP 文件并转换，使用 Python 重命名文件

Question

I have directory that have a lot of ZIP files.我的目录有很多 ZIP 文件。 The ZIP files contain a lot of CSV files. ZIP 文件包含很多 CSV 文件。 First, I want to change the format of CSV files into parquet.首先，我想将 CSV 文件的格式更改为 parquet。 Second, I need to rename all the parquet files and store the data into CSV.其次，我需要重命名所有 parquet 文件并将数据存储到 CSV 中。 (Code below). （下面的代码）。 I need to work with the zip files and not extracting the files to save some storage space.我需要使用 zip 文件，而不是提取文件以节省一些存储空间。

Below is the code to convert to.parquet.下面是转换为.parquet 的代码。

flist = ul.get_flist(r"D:\Proyekan\Data yang udah di extract", "csv")
target_folder = "D:\\Proyekan\\Data yang udah di extract\\Parquet\\"
for i, fpath in enumerate(flist):
    #fname = fpath.split('\\')[-1]
    df = pd.read_csv(fpath)
    fname = fpath.split('\\')[-1].split('.')[0] + '.parquet'
    print(f"{i:03} ... Working on file ... {fname}")
    df.to_parquet(f"{target_folder}{fname}", compression="gzip")

And below is the code to rename the files下面是重命名文件的代码

import os
import pandas as pd
#This is to rename files

path = "D:\Proyekan\Data FDM"
count = 1

ori_filename = []
new_filename = []
folder = []
head, tail = os.path.split(path)
for root, dirs, files in os.walk(path):
    for file in files:
        new_filecode = "flight_" + str(1000000 + count) +".mat"

        ori_filename.append(os.path.basename(file))
        new_filename.append(new_filecode)
        folder.append(os.path.basename(root))

        fullpath = os.path.join(root,file)
        os.rename(fullpath, os.path.join(root, new_filecode))

        count += 1

#Store data to csv
df = pd.DataFrame(list(zip(ori_filename, new_filename, folder)), columns = ['raw_file','file_id','tail_number'])
df.to_csv(r'D:\Proyekan\FILES\Metadata.csv',index = False, header = True)

Any ideas how do I edit this code to read ZIP files?任何想法如何编辑此代码以读取 ZIP 文件？ Any help would be appreciated任何帮助，将不胜感激

Answer 1

this is a shot in the dark, try it and let's see how it goes:这是在黑暗中拍摄的，尝试一下，让我们看看它是如何进行的：

from zipfile import ZipFile
folder = 'list of zipped files'
#iterate through every zip file
for zips in folder:
    with ZipFile(zips) as myzip:
        for csv in myzip.namelist():
            with myzip.open(csv) as myfile:
                #read the csv in the zip and convert to parquet
                #if this does not work, u could break down the steps
                #read the data first into a variable, then to parquet next
                pd.read_csv(myfile).to_parquet('new_location')

使用目录中的多个 ZIP 文件并转换，使用 Python 重命名文件

问题描述

1 个解决方案

解决方案1
0 2020-04-13 09:59:36

使用目录中的多个 ZIP 文件并转换，使用 Python 重命名文件

问题描述

1 个解决方案

解决方案1 0 2020-04-13 09:59:36

解决方案1
0 2020-04-13 09:59:36