[英]Working with multiple ZIP files inside directory and convert, rename the files with Python
I have directory that have a lot of ZIP files.我的目录有很多 ZIP 文件。 The ZIP files contain a lot of CSV files.
ZIP 文件包含很多 CSV 文件。 First, I want to change the format of CSV files into parquet.
首先,我想将 CSV 文件的格式更改为 parquet。 Second, I need to rename all the parquet files and store the data into CSV.
其次,我需要重命名所有 parquet 文件并将数据存储到 CSV 中。 (Code below).
(下面的代码)。 I need to work with the zip files and not extracting the files to save some storage space.
我需要使用 zip 文件,而不是提取文件以节省一些存储空间。
Below is the code to convert to.parquet.下面是转换为.parquet 的代码。
flist = ul.get_flist(r"D:\Proyekan\Data yang udah di extract", "csv")
target_folder = "D:\\Proyekan\\Data yang udah di extract\\Parquet\\"
for i, fpath in enumerate(flist):
#fname = fpath.split('\\')[-1]
df = pd.read_csv(fpath)
fname = fpath.split('\\')[-1].split('.')[0] + '.parquet'
print(f"{i:03} ... Working on file ... {fname}")
df.to_parquet(f"{target_folder}{fname}", compression="gzip")
And below is the code to rename the files下面是重命名文件的代码
import os
import pandas as pd
#This is to rename files
path = "D:\Proyekan\Data FDM"
count = 1
ori_filename = []
new_filename = []
folder = []
head, tail = os.path.split(path)
for root, dirs, files in os.walk(path):
for file in files:
new_filecode = "flight_" + str(1000000 + count) +".mat"
ori_filename.append(os.path.basename(file))
new_filename.append(new_filecode)
folder.append(os.path.basename(root))
fullpath = os.path.join(root,file)
os.rename(fullpath, os.path.join(root, new_filecode))
count += 1
#Store data to csv
df = pd.DataFrame(list(zip(ori_filename, new_filename, folder)), columns = ['raw_file','file_id','tail_number'])
df.to_csv(r'D:\Proyekan\FILES\Metadata.csv',index = False, header = True)
Any ideas how do I edit this code to read ZIP files?任何想法如何编辑此代码以读取 ZIP 文件? Any help would be appreciated
任何帮助,将不胜感激
this is a shot in the dark, try it and let's see how it goes:这是在黑暗中拍摄的,尝试一下,让我们看看它是如何进行的:
from zipfile import ZipFile
folder = 'list of zipped files'
#iterate through every zip file
for zips in folder:
with ZipFile(zips) as myzip:
for csv in myzip.namelist():
with myzip.open(csv) as myfile:
#read the csv in the zip and convert to parquet
#if this does not work, u could break down the steps
#read the data first into a variable, then to parquet next
pd.read_csv(myfile).to_parquet('new_location')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.