如何從內存中的流式zip文件訪問zipEntry

Question

我目前正在實現一個Ereader庫（ skyepub ），它要求我實現一個檢查zipEntry是否存在的方法。 在他們的演示版中，解決方案很簡單：

public boolean isExists(String baseDirectory,String contentPath) {
    setupZipFile(baseDirectory,contentPath);
    if (this.isCustomFont(contentPath)) {
        String path = baseDirectory +"/"+ contentPath;
        File file = new File(path);
        return file.exists();
    }

    ZipEntry entry = this.getZipEntry(contentPath);
    if (entry==null) return false;
    else return true;       
}

// Entry name should start without / like META-INF/container.xml 

private ZipEntry getZipEntry(String contentPath) {

    if (zipFile==null) return null;

    String[] subDirs = contentPath.split(Pattern.quote(File.separator));

    String corePath = contentPath.replace(subDirs[1], "");

    corePath=corePath.replace("//", "");

    ZipEntry entry = zipFile.getEntry(corePath.replace(File.separatorChar, '/'));

    return entry;

}

如您所見，您可以使用getZipEntry(contentPath);在O（1）時間內訪問有問題的ZipEntry getZipEntry(contentPath);

但是，在我的情況下，我無法直接從文件系統讀取zip文件（出於安全原因必須從內存中讀取）..所以我的ifExists實現實際上一次通過zip文件一個條目 ，直到它找到zipEntry有問題，這里是相關部分：

try {
        final InputStream stream = dbUtil.getBookStream(bookEditionID);
        if( stream == null) return null;

        final ZipInputStream zip = new ZipInputStream(stream);

        ZipEntry entry;
        do {
            entry = zip.getNextEntry();
            if( entry == null) {
                zip.close();
                return null;
            }
        } while( !entry.getName().equals(zipEntryName));

    } catch( IOException e) {
        Log.e("demo", "Can't get content data for "+contentPath);
        return null;
    }

    return data;

因此，如果數據存在， ifExists返回true，否則返回false。

題

有沒有辦法可以在O（1）時間而不是O（n）時間內從整個ZipInputStream中找到有問題的zip條目？

有關

看到這個問題和這個答案。

Answer 1

如果存檔的內容在內存中，那么它是可搜索的，您可以搜索中心目錄並自己使用它。 ZipFile和Apache Commons Compress現在沒有其他任何東西，只有File ，但其他開源庫可能（不確定zip4j ）。

Apache Commons Compress的ZipFile中搜索中心目錄並解析它的代碼應該很容易適應存檔作為byte[] 。 事實上，有一個補丁尚未應用，可以作為COMPRESS-327的一部分。

Answer 2

zip存檔中的條目無法在O（1）時間內真正加載。 如果我們看一下zip存檔的結構，它看起來像這樣：

  [local file header 1]
  [encryption header 1]
  [file data 1]
  [data descriptor 1]
  ... 
  [local file header n]
  [encryption header n]
  [file data n]
  [data descriptor n]
  [archive decryption header] 
  [archive extra data record] 
  [central directory header 1]
  .
  [central directory header n]
  [zip64 end of central directory record]
  [zip64 end of central directory locator] 
  [end of central directory record]

基本上，有一些包含一些標題的壓縮文件和一個“中心目錄”，其中包含有關文件的所有元數據（中央目錄標題）。 如何查找條目的唯一有效方法是掃描中央目錄（更多信息）：

...不得掃描ZIP文件頂部的條目，因為只有中心目錄指定文件塊的開始位置

由於中央目錄標題上沒有索引，因此只能在O(n)中獲取一個條目，其中n是歸檔中的文件數。

更新：不幸的是，我所知道的所有使用流而不是文件的zip庫都使用本地文件頭並掃描整個流，包括內容。 它們也不容易彎曲。 如何避免掃描我發現的整個存檔的唯一方法是自己調整庫。

更新2：我已經冒昧地為您的目的修改上述zip4j庫。 假設你在一個字節數組中讀取了你的zip文件並且你已經在zip4j版本1.3.2上添加了一個依賴項，你可以像這樣使用MemoryHeaderReader和RandomByteStream ：

String myZipFile = "...";
byte[] bytes = readFile();
MemoryHeaderReader headerReader = new MemoryHeaderReader(RandomAccessStream.fromBytes(bytes));
ZipModel zipModel = headerReader.readAllHeaders();
FileHeader myFile = Zip4jUtil.getFileHeader(zipModel, myZipFile)
boolean fileIsPresent = myFile != null;

它在O（entryCount）中工作而不讀取整個存檔，這應該相當快。 我還沒有對它進行徹底的測試，但它應該讓你知道如何根據你的目的調整zip4j。

Answer 3

從技術上講，搜索始終為O（ n ），其中n是zip文件中的條目數，因為您必須通過中心目錄或本地標題進行線性搜索。

您似乎暗示zip文件完全加載到內存中。 在這種情況下，最快的方法是搜索中心目錄中的條目。 如果找到它，那么該目錄條目將指向本地標頭。

如果您在同一個zip文件上進行了大量搜索，那么您可以在O（ n ）時間內在中心目錄中構建名稱的哈希表，然后使用它來搜索大約O中的給定名稱（ 1次。

如何從內存中的流式zip文件訪問zipEntry

問題描述

題

有關

3 個解決方案

解決方案1
1 2016-04-23 10:45:48

解決方案2
1 已采納 2016-04-23 13:05:05

解決方案3
1 2016-04-23 14:38:04

如何從內存中的流式zip文件訪問zipEntry

問題描述

題

有關

3 個解決方案

解決方案1 1 2016-04-23 10:45:48

解決方案2 1 已采納 2016-04-23 13:05:05

解決方案3 1 2016-04-23 14:38:04

解決方案1
1 2016-04-23 10:45:48

解決方案2
1 已采納 2016-04-23 13:05:05

解決方案3
1 2016-04-23 14:38:04