简体   繁体   English

使用目录中的多个 ZIP 文件并转换,使用 Python 重命名文件

[英]Working with multiple ZIP files inside directory and convert, rename the files with Python

I have directory that have a lot of ZIP files.我的目录有很多 ZIP 文件。 The ZIP files contain a lot of CSV files. ZIP 文件包含很多 CSV 文件。 First, I want to change the format of CSV files into parquet.首先,我想将 CSV 文件的格式更改为 parquet。 Second, I need to rename all the parquet files and store the data into CSV.其次,我需要重命名所有 parquet 文件并将数据存储到 CSV 中。 (Code below). (下面的代码)。 I need to work with the zip files and not extracting the files to save some storage space.我需要使用 zip 文件,而不是提取文件以节省一些存储空间。

Below is the code to convert to.parquet.下面是转换为.parquet 的代码。

flist = ul.get_flist(r"D:\Proyekan\Data yang udah di extract", "csv")
target_folder = "D:\\Proyekan\\Data yang udah di extract\\Parquet\\"
for i, fpath in enumerate(flist):
    #fname = fpath.split('\\')[-1]
    df = pd.read_csv(fpath)
    fname = fpath.split('\\')[-1].split('.')[0] + '.parquet'
    print(f"{i:03} ... Working on file ... {fname}")
    df.to_parquet(f"{target_folder}{fname}", compression="gzip")

And below is the code to rename the files下面是重命名文件的代码

import os
import pandas as pd
#This is to rename files

path = "D:\Proyekan\Data FDM"
count = 1

ori_filename = []
new_filename = []
folder = []
head, tail = os.path.split(path)
for root, dirs, files in os.walk(path):
    for file in files:
        new_filecode = "flight_" + str(1000000 + count) +".mat"

        ori_filename.append(os.path.basename(file))
        new_filename.append(new_filecode)
        folder.append(os.path.basename(root))

        fullpath = os.path.join(root,file)
        os.rename(fullpath, os.path.join(root, new_filecode))

        count += 1

#Store data to csv
df = pd.DataFrame(list(zip(ori_filename, new_filename, folder)), columns = ['raw_file','file_id','tail_number'])
df.to_csv(r'D:\Proyekan\FILES\Metadata.csv',index = False, header = True)

Any ideas how do I edit this code to read ZIP files?任何想法如何编辑此代码以读取 ZIP 文件? Any help would be appreciated任何帮助,将不胜感激

this is a shot in the dark, try it and let's see how it goes:这是在黑暗中拍摄的,尝试一下,让我们看看它是如何进行的:

from zipfile import ZipFile
folder = 'list of zipped files'
#iterate through every zip file
for zips in folder:
    with ZipFile(zips) as myzip:
        for csv in myzip.namelist():
            with myzip.open(csv) as myfile:
                #read the csv in the zip and convert to parquet
                #if this does not work, u could break down the steps
                #read the data first into a variable, then to parquet next
                pd.read_csv(myfile).to_parquet('new_location')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM