简体   繁体   English

使用Perl在Windows系统上测试许多压缩文件的有效性

[英]Testing the validity of many gzipped files on a Windows system using Perl

I have thousands (or more) of gzipped files in a directory (on a Windows system) and one of my tools consumes those gzipped files. 我在Windows系统上的目录中有数千个(或更多)压缩文件,而我的工具之一使用了这些压缩文件。 If it encounters a corrupt gzip file, it conveniently ignores them instead of raising an alarm. 如果遇到损坏的gzip文件,它会方便地忽略它们而不发出警报。

I have been trying to write a Perl program that loops through each file and makes a list of files which are corrupt. 我一直在尝试编写一个Perl程序,该程序循环遍历每个文件并列出损坏的文件列表。

I am using the Compress::Zlib module, and have tried reading the first 1KB of each file, but that did not work since some of the files are corrupted towards the end (verified during the manual extract, alarm raised only towards the end) and reading first 1KB doesn't show a problem. 我正在使用Compress::Zlib模块,并尝试读取每个文件的前1KB,但这没有用,因为某些文件在结尾时已损坏(在手动提取过程中验证,仅在结尾时发出警报)并且读取第一个1KB不会显示问题。 I am wondering if a CRC check of these files will be of any help. 我想知道对这些文件进行CRC检查是否有帮助。

Questions: 问题:

  1. Will CRC validation work in this case? 在这种情况下,CRC验证会起作用吗? If yes, how does it work? 如果是,它如何工作? Will the true CRC be part of the gzip header, and we are to compare it with the calculated CRC from the file we have? 真正的CRC是否会成为gzip标头的一部分,我们将其与从我们拥有的文件中计算出的CRC进行比较? How do I accomplish this in Perl? 如何在Perl中完成此操作?

  2. Are there any other simpler ways to do this? 还有其他更简单的方法吗?

In short, the only way to check a gzip file is to decompress it until you get an error, or get to the end successfully. 简而言之,检查gzip文件的唯一方法是将其解压缩,直到出现错误或成功结束。 You do not however need to store the result of the decompression. 但是,您不需要存储解压缩的结果。

The CRC stored at the end of a gzip file is the CRC of the uncompressed data, not the compressed data. 存储在gzip文件末尾的CRC是未压缩数据的CRC,而不是压缩数据。 To use it for verification, you have to decompress all of the data. 要将其用于验证,您必须解压缩所有数据。 This is what gzip -t does, decompressing the data and checking the CRC, but not storing the uncompressed data. gzip -t就是这样做的,它解压缩数据并检查CRC,但不存储未压缩的数据。

Often a corruption in the compressed data will be detected before getting to the end. 在结束之前,通常会检测到压缩数据中的损坏。 But if not, then the CRC, as well as a check against an uncompressed length also stored at the end, will with a probability very close to one detect a corrupted file. 但是,如果不是这样,那么CRC以及最后存储的未压缩长度检查将很有可能检测到损坏的文件。

The Archive::Zip FAQ gives some very good guidance on this. Archive::Zip常见问题解答对此提供了很好的指导。

It looks like the best option for you is to check the CRC of each member of the archives, and a sample program that does this -- ziptest.pl -- comes with the Archive::Zip module installation. 看来,最好的选择是检查存档中每个成员的CRC,而执行此操作的示例程序ziptest.plArchive::Zip模块安装一起提供。

仅使用“ gunzip -t”命令即可轻松测试文件是否损坏,gunzip也可用于Windows,并且应随附gzip软件包。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM