如何從Java中的Winzip自解壓（exe）zip文件讀取？

Question

是否存在現有方法，或者我需要在將數據傳遞給ZipInputStream之前手動解析並跳過exe塊？

Answer 1

在查看了EXE文件格式和ZIP文件格式並測試了各種選項之后，看來最簡單的解決方案是只忽略第一個zip本地文件頭之前的所有前導。

壓縮文件布局

壓縮本地文件頭

我寫了一個輸入流過濾器來繞過前言，它工作得很好：

ZipInputStream zis = new ZipInputStream(
    new WinZipInputStream(
    new FileInputStream("test.exe")));
while ((ze = zis.getNextEntry()) != null) {
    . . .
    zis.closeEntry();
}
zis.close();

WinZipInputStream.java

import java.io.FilterInputStream;
import java.io.InputStream;
import java.io.IOException;

public class WinZipInputStream extends FilterInputStream {
    public static final byte[] ZIP_LOCAL = { 0x50, 0x4b, 0x03, 0x04 };
    protected int ip;
    protected int op;

    public WinZipInputStream(InputStream is) {
        super(is);
    }

    public int read() throws IOException {
        while(ip < ZIP_LOCAL.length) {
            int c = super.read();
            if (c == ZIP_LOCAL[ip]) {
                ip++;
            }
            else ip = 0;
        }

        if (op < ZIP_LOCAL.length)
            return ZIP_LOCAL[op++];
        else
            return super.read();
    }

    public int read(byte[] b, int off, int len) throws IOException {
        if (op == ZIP_LOCAL.length) return super.read(b, off, len);
        int l = 0;
        while (l < Math.min(len, ZIP_LOCAL.length)) {
            b[l++] = (byte)read();
        }
        return l;
    }
}

Answer 2

ZIP文件的優點在於它們的順序結構：每個條目都是一堆獨立的字節，最后是一個中央目錄索引 ，該索引列出了文件中的所有條目及其偏移量。

糟糕的是， java.util.zip.*類會忽略該索引，而只是開始讀入文件，並期望第一個條目是Local File Header塊，而自解壓ZIP歸檔不是這種情況（這些以EXE部分開頭）。

幾年前，我編寫了一個自定義的ZIP解析器，以提取依賴於CDI的單個ZIP條目（LFH +數據），以查找這些條目在文件中的位置。 我剛剛檢查了一下，它實際上可以列出自擴展ZIP存檔的條目，而無需多加費力，並且可以為您提供補償-因此您可以：

~~使用該代碼查找EXE部分之后的第一個LFH，並將偏移量之后的所有內容復制到另一個File ，然后將該新File給java.util.zip.ZipFile~~ ：
編輯：只是跳過EXE部分似乎不起作用， ZipFile仍然不會讀取它，並且我的本機ZIP程序抱怨新的ZIP文件已損壞，而我跳過的字節數正好是“丟失”（因此它實際上會讀取CDI）。 我想有些標頭需要重寫，因此下面給出的第二種方法看起來更有希望-或
使用該代碼進行完整的ZIP提取（類似於java.util.zip ）； 這將需要一些額外的檢查，因為該代碼最初不打算用作替代ZIP庫，但是具有非常特定的用例（通過HTTP差異更新ZIP文件）

該代碼托管在SourceForge（項目頁面，網站）上，並獲得Apache License 2.0許可，因此可以商業使用-AFAIK有一款商業游戲將其用作其游戲資產的更新程序。

從ZIP文件獲取偏移量的有趣部分在Indexer.parseZipFile ，它返回LinkedHashMap<Resource, Long> （因此第一個地圖條目的偏移量在文件中最低）。 這是我用來列出自解壓ZIP存檔（使用WinZIP SE創建者和Wine在Ubuntu上通過acra發行文件創建的）的條目的代碼：

public static void main(String[] args) throws Exception {
    File archive = new File("/home/phil/downloads", "acra-4.2.3.exe");
    Map<Resource, Long> resources = parseZipFile(archive);
    for (Entry<Resource, Long> resource : resources.entrySet()) {
        System.out.println(resource.getKey() + ": " + resource.getValue());
    }
}

除了包含所有標頭解析類的Indexer類和zip包之外，您可能可以刪除大部分代碼。

Answer 3

在一些自解壓的ZIP文件中有偽造的Local File Header標記。 我認為最好向后掃描文件以找到End Of Central Directory記錄。 EOCD記錄包含中央目錄的偏移量， CD包含第一個本地文件頭的偏移量。 如果您從本地文件頭的第一個字節開始讀取，則ZipInputStream可以正常工作。

顯然，下面的代碼不是最快的解決方案。 如果要處理大文件，則應實現某種緩沖或使用內存映射文件。

import org.apache.commons.io.EndianUtils;
...

public class ZipHandler {
    private static final byte[] EOCD_MARKER = { 0x06, 0x05, 0x4b, 0x50 };

    public InputStream openExecutableZipFile(Path zipFilePath) throws IOException {
        try (RandomAccessFile raf = new RandomAccessFile(zipFilePath.toFile(), "r")) {
            long position = raf.length() - 1;
            int markerIndex = 0;
            byte[] buffer = new byte[4];
            while (position > EOCD_MARKER.length) {
                raf.seek(position);
                raf.read(buffer, 0 ,1);
                if (buffer[0] == EOCD_MARKER[markerIndex]) {
                    markerIndex++;
                } else {
                    markerIndex = 0;
                }
                if (markerIndex == EOCD_MARKER.length) {
                    raf.skipBytes(15);
                    raf.read(buffer, 0, 4);
                    int centralDirectoryOffset = EndianUtils.readSwappedInteger(buffer, 0);
                    raf.seek(centralDirectoryOffset);
                    raf.skipBytes(42);
                    raf.read(buffer, 0, 4);
                    int localFileHeaderOffset = EndianUtils.readSwappedInteger(buffer, 0);
                    return new SkippingInputStream(Files.newInputStream(zipFilePath), localFileHeaderOffset);
                }
                position--;
            }
            throw new IOException("No EOCD marker found");
        }
    }
}

public class SkippingInputStream extends FilterInputStream {
    private int bytesToSkip;
    private int bytesAlreadySkipped;

    public SkippingInputStream(InputStream inputStream, int bytesToSkip) {
        super(inputStream);
        this.bytesToSkip = bytesToSkip;
        this.bytesAlreadySkipped = 0;
    }

    @Override
    public int read() throws IOException {
        while (bytesAlreadySkipped < bytesToSkip) {
            int c = super.read();
            if (c == -1) {
                return -1;
            }
            bytesAlreadySkipped++;
        }
        return super.read();
    }

    @Override
    public int read(byte[] b, int off, int len) throws IOException {
        if (bytesAlreadySkipped == bytesToSkip) {
            return super.read(b, off, len);
        }
        int count = 0;
        while (count < len) {
            int c = read();
            if (c == -1) {
                break;
            }
            b[count++] = (byte) c;
        }
        return count;
    }
}

Answer 4

TrueZip在這種情況下效果最好。 （就我而言）

自解壓的zip格式為code1 header1 file1（普通的zip格式為header1 file1）...代碼說明了如何解壓縮zip

盡管Truezip提取實用程序會抱怨多余的字節並引發異常

這是代碼

 private void Extract(String src, String dst, String incPath) {
    TFile srcFile = new TFile(src, incPath);
    TFile dstFile = new TFile(dst);
    try {
        TFile.cp_rp(srcFile, dstFile, TArchiveDetector.NULL);
        } 
    catch (IOException e) {
       //Handle Exception
        }
}

您可以像Extract（new String（“ C：\\ 2006Production.exe”），new String（“ c：\\”），“”）這樣調用此方法；

該文件已解壓縮到c驅動器中...您可以對文件執行自己的操作。 我希望這有幫助。

謝謝。

如何從Java中的Winzip自解壓（exe）zip文件讀取？

問題描述

4 個解決方案

解決方案1
13 已采納 2011-10-31 15:10:06

解決方案2
7 2011-10-28 09:41:24

解決方案3
2 2016-02-19 16:21:57

解決方案4
-1 2012-07-02 16:47:18

如何從Java中的Winzip自解壓（exe）zip文件讀取？

問題描述

4 個解決方案

解決方案1 13 已采納 2011-10-31 15:10:06

解決方案2 7 2011-10-28 09:41:24

解決方案3 2 2016-02-19 16:21:57

解決方案4 -1 2012-07-02 16:47:18

解決方案1
13 已采納 2011-10-31 15:10:06

解決方案2
7 2011-10-28 09:41:24

解決方案3
2 2016-02-19 16:21:57

解決方案4
-1 2012-07-02 16:47:18