简体   繁体   中英

Testing the validity of many gzipped files on a Windows system using Perl

I have thousands (or more) of gzipped files in a directory (on a Windows system) and one of my tools consumes those gzipped files. If it encounters a corrupt gzip file, it conveniently ignores them instead of raising an alarm.

I have been trying to write a Perl program that loops through each file and makes a list of files which are corrupt.

I am using the Compress::Zlib module, and have tried reading the first 1KB of each file, but that did not work since some of the files are corrupted towards the end (verified during the manual extract, alarm raised only towards the end) and reading first 1KB doesn't show a problem. I am wondering if a CRC check of these files will be of any help.

Questions:

  1. Will CRC validation work in this case? If yes, how does it work? Will the true CRC be part of the gzip header, and we are to compare it with the calculated CRC from the file we have? How do I accomplish this in Perl?

  2. Are there any other simpler ways to do this?

In short, the only way to check a gzip file is to decompress it until you get an error, or get to the end successfully. You do not however need to store the result of the decompression.

The CRC stored at the end of a gzip file is the CRC of the uncompressed data, not the compressed data. To use it for verification, you have to decompress all of the data. This is what gzip -t does, decompressing the data and checking the CRC, but not storing the uncompressed data.

Often a corruption in the compressed data will be detected before getting to the end. But if not, then the CRC, as well as a check against an uncompressed length also stored at the end, will with a probability very close to one detect a corrupted file.

The Archive::Zip FAQ gives some very good guidance on this.

It looks like the best option for you is to check the CRC of each member of the archives, and a sample program that does this -- ziptest.pl -- comes with the Archive::Zip module installation.

仅使用“ gunzip -t”命令即可轻松测试文件是否损坏,gunzip也可用于Windows,并且应随附gzip软件包。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM