简体   繁体   English

C#.net识别zip文件

[英]C#.net identify zip file

I am currently using the SharpZip api to handle my zip file entries.我目前正在使用 SharpZip api 来处理我的 zip 文件条目。 It works splendid for zipping and unzipping.它非常适合压缩和解压缩。 Though, I am having trouble identifying if a file is a zip or not.不过,我无法确定文件是否为 zip 文件。 I need to know if there is a way to detect if a file stream can be decompressed.我需要知道是否有办法检测文件流是否可以解压缩。 Originally I used本来我用

FileStream lFileStreamIn = File.OpenRead(mSourceFile);
lZipFile = new ZipFile(lFileStreamIn);
ZipInputStream lZipStreamTester = new ZipInputStream(lFileStreamIn, mBufferSize);// not working
lZipStreamTester.Read(lBuffer, 0, 0);
if (lZipStreamTester.CanDecompressEntry)
{

The LZipStreamTester becomes null every time and the if statement fails. LZipStreamTester 每次都变为空并且 if 语句失败。 I tried it with/without a buffer.我在有/没有缓冲区的情况下尝试过。 Can anybody give any insight as to why?任何人都可以就为什么提供任何见解吗? I am aware that i can check for file extension.我知道我可以检查文件扩展名。 I need something that is more definitive than that.我需要比那更明确的东西。 I am also aware that zip has a magic #(PK something), but it isn't a guarantee that it will always be there because it isn't a requirement of the format.我也知道 zip 有一个神奇的#(PK something),但它不能保证它会一直存在,因为它不是格式的要求。

Also i read about .net 4.5 having native zip support so my project may migrate to that instead of sharpzip but I still need didn't see a method/param similar to CanDecompressEntry here: http://msdn.microsoft.com/en-us/library/3z72378a%28v=vs.110%29我还阅读了有关具有本机 zip 支持的 .net 4.5 的信息,因此我的项目可能会迁移到那个而不是 sharpzip,但我仍然需要在此处没有看到类似于 CanDecompressEntry 的方法/参数:http: //msdn.microsoft.com/en-美国/图书馆/3z72378a%28v=vs.110%29

My last resort will be to use a try catch and attempt an unzip on the file.我最后的办法是使用 try catch 并尝试解压缩文件。

This is a base class for a component that needs to handle data that is either uncompressed, PKZIP compressed (sharpziplib) or GZip compressed (built in .net).这是需要处理未压缩、PKZIP 压缩 (sharpziplib) 或 GZip 压缩(内置 .net)数据的组件的基类。 Perhaps a bit more than you need but should get you going.也许比你需要的多一点,但应该让你继续。 This is an example of using @PhonicUK's suggestion to parse the header of the data stream.这是使用@PhonicUK 的建议来解析数据流的标头的示例。 The derived classes you see in the little factory method handled the specifics of PKZip and GZip decompression.您在小工厂方法中看到的派生类处理 PKZip 和 GZip 解压缩的细节。

abstract class Expander
{
    private const int ZIP_LEAD_BYTES = 0x04034b50;
    private const ushort GZIP_LEAD_BYTES = 0x8b1f;

    public abstract MemoryStream Expand(Stream stream); 
    
    internal static bool IsPkZipCompressedData(byte[] data)
    {
        Debug.Assert(data != null && data.Length >= 4);
        // if the first 4 bytes of the array are the ZIP signature then it is compressed data
        return (BitConverter.ToInt32(data, 0) == ZIP_LEAD_BYTES);
    }

    internal static bool IsGZipCompressedData(byte[] data)
    {
        Debug.Assert(data != null && data.Length >= 2);
        // if the first 2 bytes of the array are theG ZIP signature then it is compressed data;
        return (BitConverter.ToUInt16(data, 0) == GZIP_LEAD_BYTES);
    }

    public static bool IsCompressedData(byte[] data)
    {
        return IsPkZipCompressedData(data) || IsGZipCompressedData(data);
    }

    public static Expander GetExpander(Stream stream)
    {
        Debug.Assert(stream != null);
        Debug.Assert(stream.CanSeek);
        stream.Seek(0, 0);

        try
        {
            byte[] bytes = new byte[4];

            stream.Read(bytes, 0, 4);

            if (IsGZipCompressedData(bytes))
                return new GZipExpander();

            if (IsPkZipCompressedData(bytes))
                return new ZipExpander();

            return new NullExpander();
        }
        finally
        {
            stream.Seek(0, 0);  // set the stream back to the begining
        }
    }
}

View https://stackoverflow.com/a/16587134/206730 reference查看https://stackoverflow.com/a/16587134/206730参考

Check the below links:检查以下链接:

icsharpcode-sharpziplib-validate-zip-file icsharpcode-sharpziplib-验证-zip-文件

How-to-check-if-a-file-is-compressed-in-c# 如何检查文件是否在 C 语言中被压缩#

ZIP files always start with 0x04034b50 (4 bytes) ZIP 文件始终以 0x04034b50(4 个字节)开头
View more: http://en.wikipedia.org/wiki/Zip_(file_format)#File_headers查看更多: http ://en.wikipedia.org/wiki/Zip_(file_format)#File_headers

Sample usage:示例用法:

        bool isPKZip = IOHelper.CheckSignature(pkg, 4, IOHelper.SignatureZip);
        Assert.IsTrue(isPKZip, "Not ZIP the package : " + pkg);

// http://blog.somecreativity.com/2008/04/08/how-to-check-if-a-file-is-compressed-in-c/
    public static partial class IOHelper
    {
        public const string SignatureGzip = "1F-8B-08";
        public const string SignatureZip = "50-4B-03-04";

        public static bool CheckSignature(string filepath, int signatureSize, string expectedSignature)
        {
            if (String.IsNullOrEmpty(filepath)) throw new ArgumentException("Must specify a filepath");
            if (String.IsNullOrEmpty(expectedSignature)) throw new ArgumentException("Must specify a value for the expected file signature");
            using (FileStream fs = new FileStream(filepath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
            {
                if (fs.Length < signatureSize)
                    return false;
                byte[] signature = new byte[signatureSize];
                int bytesRequired = signatureSize;
                int index = 0;
                while (bytesRequired > 0)
                {
                    int bytesRead = fs.Read(signature, index, bytesRequired);
                    bytesRequired -= bytesRead;
                    index += bytesRead;
                }
                string actualSignature = BitConverter.ToString(signature);
                if (actualSignature == expectedSignature) return true;
                return false;
            }
        }

    }

You can either:您可以:

  • Use a try-catch structure and try to read the structure of a potential zip file使用 try-catch 结构并尝试读取潜在 zip 文件的结构
  • Parse the file header to see if it is a zip file解析文件头看是否是zip文件

ZIP files always start with 0x04034b50 as its first 4 bytes ( http://en.wikipedia.org/wiki/Zip_(file_format)#File_headers ) ZIP 文件的前 4 个字节总是以 0x04034b50 开头 ( http://en.wikipedia.org/wiki/Zip_(file_format)#File_headers )

I used https://en.wikipedia.org/wiki/List_of_file_signatures , just adding an extra byte on for my zip files, to differentiate between my zip files and Word documents (these share the first four bytes).我使用了https://en.wikipedia.org/wiki/List_of_file_signatures ,只是为我的 zip 文件添加了一个额外的字节,以区分我的 zip 文件和 Word 文档(它们共享前四个字节)。

Here is my code:这是我的代码:

public class ZipFileUtilities
{
    private static readonly byte[] ZipBytes1 = { 0x50, 0x4b, 0x03, 0x04, 0x0a };
    private static readonly byte[] GzipBytes = { 0x1f, 0x8b };
    private static readonly byte[] TarBytes = { 0x1f, 0x9d };
    private static readonly byte[] LzhBytes = { 0x1f, 0xa0 };
    private static readonly byte[] Bzip2Bytes = { 0x42, 0x5a, 0x68 };
    private static readonly byte[] LzipBytes = { 0x4c, 0x5a, 0x49, 0x50 };
    private static readonly byte[] ZipBytes2 = { 0x50, 0x4b, 0x05, 0x06 };
    private static readonly byte[] ZipBytes3 = { 0x50, 0x4b, 0x07, 0x08 };

    public static byte[] GetFirstBytes(string filepath, int length)
    {
        using (var sr = new StreamReader(filepath))
        {
            sr.BaseStream.Seek(0, 0);
            var bytes = new byte[length];
            sr.BaseStream.Read(bytes, 0, length);

            return bytes;
        }
    }

    public static bool IsZipFile(string filepath)
    {
        return IsCompressedData(GetFirstBytes(filepath, 5));
    }

    public static bool IsCompressedData(byte[] data)
    {
        foreach (var headerBytes in new[] { ZipBytes1, ZipBytes2, ZipBytes3, GzipBytes, TarBytes, LzhBytes, Bzip2Bytes, LzipBytes })
        {
            if (HeaderBytesMatch(headerBytes, data))
                return true;
        }

        return false;
    }

    private static bool HeaderBytesMatch(byte[] headerBytes, byte[] dataBytes)
    {
        if (dataBytes.Length < headerBytes.Length)
            throw new ArgumentOutOfRangeException(nameof(dataBytes), 
                $"Passed databytes length ({dataBytes.Length}) is shorter than the headerbytes ({headerBytes.Length})");

        for (var i = 0; i < headerBytes.Length; i++)
        {
            if (headerBytes[i] == dataBytes[i]) continue;

            return false;
        }

        return true;
    }

 }

There may be better ways to code this particularly the byte compare, but as its a variable length byte compare (depending on the signature being checked), I felt at least this code is readable - to me at least.可能有更好的编码方式,尤其是字节比较,但由于它是一个可变长度的字节比较(取决于被检查的签名),我觉得至少这段代码是可读的——至少对我来说是这样。

If you are programming for Web, you can check the file Content Type: application/zip如果您正在为 Web 编程,您可以检查文件内容类型:application/zip

Thanks to dkackman and Kiquenet for answers above.感谢 dkackman 和 Kiquenet 上面的回答。 For completeness, the below code uses the signature to identify compressed (zip) files.为了完整起见,以下代码使用签名来识别压缩 (zip) 文件。 You then have the added complexity that the newer MS Office file formats will also return match this signature lookup (your.docx and.xlsx files etc).然后,较新的 MS Office 文件格式也将返回匹配此签名查找(您的 .docx 和 .xlsx 文件等),这会增加复杂性。 As remarked upon elsewhere, these are indeed compressed archives, you can rename the files with a.zip extension and have a look at the XML inside.正如其他地方所说,这些确实是压缩档案,您可以将文件重命名为 .zip 扩展名并查看其中的 XML。

Below code, first does a check for ZIP (compressed) using the signatures used above, and we then have a subsequent check for the MS Office packages.在代码下方,首先使用上面使用的签名检查 ZIP(压缩),然后我们对 MS Office 包进行后续检查。 Note that to use the System.IO.Packaging.Package you need a project reference to "WindowsBase" (that is a .NET assembly reference).请注意,要使用 System.IO.Packaging.Package,您需要对“WindowsBase”的项目引用(即 .NET 程序集引用)。

    private const string SignatureZip = "50-4B-03-04";
    private const string SignatureGzip = "1F-8B-08";

    public static bool IsZip(this Stream stream)
    {
        if (stream.Position > 0)
        {
            stream.Seek(0, SeekOrigin.Begin);
        }

        bool isZip = CheckSignature(stream, 4, SignatureZip);
        bool isGzip = CheckSignature(stream, 3, SignatureGzip);

        bool isSomeKindOfZip = isZip || isGzip;

        if (isSomeKindOfZip && stream.IsPackage()) //Signature matches ZIP, but it's package format (docx etc).
        {
            return false;
        }

        return isSomeKindOfZip;
    }

    /// <summary>
    /// MS .docx, .xslx and other extensions are (correctly) identified as zip files using signature lookup.
    /// This tests if System.IO.Packaging is able to open, and if package has parts, this is not a zip file.
    /// </summary>
    /// <param name="stream"></param>
    /// <returns></returns>
    private static bool IsPackage(this Stream stream)
    {
        Package package = Package.Open(stream, FileMode.Open, FileAccess.Read);
        return package.GetParts().Any();
    }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM