简体   繁体   中英

Extracting file from corrupted GZ

My code snippet can extract file from GZ as save it as .txt file, but sometimes that file may contain some weird text which crashes extract module.

Some Gibberish from file:

Method I use:

def unpackgz(name ,path):
    file = path + '\\' +name
    outfilename = file[:-3]+".txt"
    inF = gzip.open(file, 'rb')
    outF = open(outfilename, 'wb')
    outF.write( inF.read() )
    inF.close()
    outF.close() 

My question how I can go around this? Something maybe similar to with open(file, errors='ignore') as fil: . Because With that method I can extract only healthy files.

EDIT to First question

def read_corrupted_file(filename):

    with gzip.open(filename, 'r') as f:
        for line in f:
            try:
                string+=line
            except Exception as e:
                print(e)
    return string

newfile = open("corrupted.txt", 'a+')
cwd = os.getcwd()
srtNameb="service"+str(46)+"b.gz"
localfilename3 = cwd +'\\'+srtNameb     
newfile.write(read_corrupted_file(localfilename3))

Results in multiple errors: Like This

Fixed to working state:

def read_corrupted_file(filename):


    string=''
    newfile = open("corrupted.txt", 'a+')
    try:
        with gzip.open(filename, 'rb') as f:
            for line in f:
                try:
                    newfile.write(line.decode('ascii'))
                except Exception as e:
                    print(e)
    except Exception as e:
        print(e)
cwd = os.getcwd()
srtNameb="service"+str(46)+"b.gz"
localfilename3 = cwd +'\\'+srtNameb 
read_corrupted_file(localfilename3)

print('done')

Generally if the file is corrupt then it will throw a error trying to unzip the file, there is not much you can do simply to still get the data, but if you just want to stop it crashing you could use a try catch.

try:
  pass
except Exception as error:
  print(error)

Applying this logic you could read line by line with gzip, with a try exception, after, still reading the next line when it hits a corrupted section.

import gzip

with gzip.open('input.gz','r') as f:
  for line in f:
    print('got line', line)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM