简体   繁体   中英

gzip multiple files in python

I have to compress a lot of XML files into and split them by the data in the file name, just for clarification's sake, there is a parser which collects information from XML file and then moves it to a backup folder. My code needs to gzip it according to the date in the filename and group those files in a compressed.gz file.

Please find the code bellow:

import os
import re
import gzip
import shutil
import sys
import time    
#
timestr = time.strftime("%Y%m%d%H%M")
logfile = 'D:\\Coleta\\log_compactador_xml_tar'+timestr+'.log'
ptm_dir = "D:\\PTM\\monitored_programs\\"
count_files_mdc = 0
count_files_3gpp = 0
count_tar = 0

#
for subdir, dir, files in os.walk(ptm_dir):
    for file in files:
        path = os.path.join(subdir, file)
        try:
            backup_files_dir = path.split(sep='\\')[4]
            parser_id = path.split(sep='\\')[3]
            if re.match('backup_files_*', backup_files_dir):
                if file.endswith('xml'):
                    # print(time.strftime("%Y-%m-%d %H:%M:%S"), path)
                    data_arq = file[1:14]
                    if parser_id in ('parser-924'):
                        gzip_filename_mdc = os.path.join(subdir,'E4G_PM_MDC_IP51_'+timestr+'_'+data_arq)
                        with open(path, 'r')as f_in, gzip.open(gzip_filename_mdc + ".gz", 'at') as f_out_mdc:
                            shutil.copyfileobj(f_in, f_out_mdc)
                            count_files_mdc += 1
                            f_out_mdc.close()
                            f_in.close()
                            print(time.strftime("%Y-%m-%d %H:%M:%S"), "Compressing file MDC: ",path)
                            os.remove(path)

        except PermissionError:
             print(time.strftime("%Y-%m-%d %H:%M:%S"), "Permission error on file:", fullpath, file=logfile)
                    pass
        except IndexError:
            print(time.strftime("%Y-%m-%d %H:%M:%S"), "IndexError: ", path, file=logfile)
        pass

As long as I seem it creates a stream of data, then compress and write it to a new file with the specified filename. However, instead of grouping each XML file independently inside a ".gz" file, it does creates inside the "gzip" file, a big file (big stream of data?) with the same name of the output "gzip" file, but without any extension. After the files are totally compressed, it's not possible to uncompress the big file generated inside the "gzip" output file. Does someone know where is the problem with my code?

PS: I have edited the code for readability purposes.

Not sure whether the solution is still needed, but I will just leave it here for anyone who faces the same issue.
There is a way to create a gzip archive in python using tarfile, the code is quite simple:

with tarfile.open(filename, mode="w:gz") as archive:
    archive.add(name=name_of_file_to_add, recursive=True)

in this case name_of_file_to_add can be a directory, in which case tarfile will add it recursively with all its contents. Obviously you will need to import the tarfile module.
If you need to add files without a directory a simple for with calls to add will do ( recursive flag is not required in this case).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM