简体   繁体   English

gzip python 中的多个文件

[英]gzip multiple files in python

I have to compress a lot of XML files into and split them by the data in the file name, just for clarification's sake, there is a parser which collects information from XML file and then moves it to a backup folder.我必须将很多 XML 文件压缩成文件名中的数据并将它们拆分,为了澄清起见,有一个解析器从 XML 文件中收集信息,然后将其移动到备份文件夹。 My code needs to gzip it according to the date in the filename and group those files in a compressed.gz file.我的代码需要根据文件名中的日期对其进行 gzip 压缩,并将这些文件分组到一个 compressed.gz 文件中。

Please find the code bellow:请找到下面的代码:

import os
import re
import gzip
import shutil
import sys
import time    
#
timestr = time.strftime("%Y%m%d%H%M")
logfile = 'D:\\Coleta\\log_compactador_xml_tar'+timestr+'.log'
ptm_dir = "D:\\PTM\\monitored_programs\\"
count_files_mdc = 0
count_files_3gpp = 0
count_tar = 0

#
for subdir, dir, files in os.walk(ptm_dir):
    for file in files:
        path = os.path.join(subdir, file)
        try:
            backup_files_dir = path.split(sep='\\')[4]
            parser_id = path.split(sep='\\')[3]
            if re.match('backup_files_*', backup_files_dir):
                if file.endswith('xml'):
                    # print(time.strftime("%Y-%m-%d %H:%M:%S"), path)
                    data_arq = file[1:14]
                    if parser_id in ('parser-924'):
                        gzip_filename_mdc = os.path.join(subdir,'E4G_PM_MDC_IP51_'+timestr+'_'+data_arq)
                        with open(path, 'r')as f_in, gzip.open(gzip_filename_mdc + ".gz", 'at') as f_out_mdc:
                            shutil.copyfileobj(f_in, f_out_mdc)
                            count_files_mdc += 1
                            f_out_mdc.close()
                            f_in.close()
                            print(time.strftime("%Y-%m-%d %H:%M:%S"), "Compressing file MDC: ",path)
                            os.remove(path)

        except PermissionError:
             print(time.strftime("%Y-%m-%d %H:%M:%S"), "Permission error on file:", fullpath, file=logfile)
                    pass
        except IndexError:
            print(time.strftime("%Y-%m-%d %H:%M:%S"), "IndexError: ", path, file=logfile)
        pass

As long as I seem it creates a stream of data, then compress and write it to a new file with the specified filename.只要我看起来它创建了一个 stream 的数据,然后将其压缩并写入具有指定文件名的新文件。 However, instead of grouping each XML file independently inside a ".gz" file, it does creates inside the "gzip" file, a big file (big stream of data?) with the same name of the output "gzip" file, but without any extension.然而,它并没有将每个 XML 文件独立地分组到一个“.gz”文件中,而是在“gzip”文件中创建了一个与 output“gzip”文件同名的大文件(大数据 stream?),但是没有任何扩展。 After the files are totally compressed, it's not possible to uncompress the big file generated inside the "gzip" output file.文件完全压缩后,无法解压缩“gzip”output 文件中生成的大文件。 Does someone know where is the problem with my code?有人知道我的代码哪里有问题吗?

PS: I have edited the code for readability purposes. PS:出于可读性目的,我已经编辑了代码。

Not sure whether the solution is still needed, but I will just leave it here for anyone who faces the same issue.不确定是否仍然需要该解决方案,但我会把它留在这里供面临相同问题的任何人使用。
There is a way to create a gzip archive in python using tarfile, the code is quite simple:有一种方法可以使用 tarfile 在 python 中创建一个 gzip 压缩包,代码非常简单:

with tarfile.open(filename, mode="w:gz") as archive:
    archive.add(name=name_of_file_to_add, recursive=True)

in this case name_of_file_to_add can be a directory, in which case tarfile will add it recursively with all its contents.在这种情况下, name_of_file_to_add可以是一个目录,在这种情况下,tarfile 将递归地添加它及其所有内容。 Obviously you will need to import the tarfile module.显然,您需要导入tarfile模块。
If you need to add files without a directory a simple for with calls to add will do ( recursive flag is not required in this case).如果你需要在没有目录的情况下添加文件,一个简单for with 调用就add了(在这种情况下不需要recursive标志)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM