My code snippet can extract file from GZ as save it as .txt file, but sometimes that file may contain some weird text which crashes extract module.
Method I use:
def unpackgz(name ,path):
file = path + '\\' +name
outfilename = file[:-3]+".txt"
inF = gzip.open(file, 'rb')
outF = open(outfilename, 'wb')
outF.write( inF.read() )
inF.close()
outF.close()
My question how I can go around this? Something maybe similar to with open(file, errors='ignore') as fil: . Because With that method I can extract only healthy files.
EDIT to First question
def read_corrupted_file(filename):
with gzip.open(filename, 'r') as f:
for line in f:
try:
string+=line
except Exception as e:
print(e)
return string
newfile = open("corrupted.txt", 'a+')
cwd = os.getcwd()
srtNameb="service"+str(46)+"b.gz"
localfilename3 = cwd +'\\'+srtNameb
newfile.write(read_corrupted_file(localfilename3))
Results in multiple errors: Like This
Fixed to working state:
def read_corrupted_file(filename):
string=''
newfile = open("corrupted.txt", 'a+')
try:
with gzip.open(filename, 'rb') as f:
for line in f:
try:
newfile.write(line.decode('ascii'))
except Exception as e:
print(e)
except Exception as e:
print(e)
cwd = os.getcwd()
srtNameb="service"+str(46)+"b.gz"
localfilename3 = cwd +'\\'+srtNameb
read_corrupted_file(localfilename3)
print('done')
Generally if the file is corrupt then it will throw a error trying to unzip the file, there is not much you can do simply to still get the data, but if you just want to stop it crashing you could use a try catch.
try:
pass
except Exception as error:
print(error)
Applying this logic you could read line by line with gzip, with a try exception, after, still reading the next line when it hits a corrupted section.
import gzip
with gzip.open('input.gz','r') as f:
for line in f:
print('got line', line)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.