简体   繁体   English

如何在java中的gzip中获取文件的文件名?

[英]How do i get a filename of a file inside a gzip in java?

int BUFFER_SIZE = 4096;
    byte[] buffer = new byte[BUFFER_SIZE];
    InputStream input = new GZIPInputStream(new FileInputStream("a_gunzipped_file.gz"));
    OutputStream output = new FileOutputStream("current_output_name");
    int n = input.read(buffer, 0, BUFFER_SIZE);
    while (n >= 0) {
        output.write(buffer, 0, n);
        n = input.read(buffer, 0, BUFFER_SIZE);
    }

    }catch(IOException e){
            System.out.println("error: \n\t" + e.getMessage());
    }

Using the above code I can succesfully extract a gzip's contents although the extracted file's filenames are, as expected, will always be current_output_name (I know its because I declared it to be that way in the code). 使用上面的代码,我可以成功地提取gzip的内容,尽管提取的文件的文件名,如预期的那样,将始终是current_output_name (我知道它,因为我在代码中声明它是这样的)。 My problem is I dont know how to get the file's filename when it is still inside the archive. 我的问题是我不知道如何获取文件的文件名仍然在存档内。

Though, java.util.zip provides a ZipEntry, I couldn't use it on gzip files. 虽然java.util.zip提供了ZipEntry,但我无法在gzip文件上使用它。 Any alternatives? 任何替代品?

as i kinda agree with "Michael Borgwardt" on his reply, but it is not entirely true, gzip file specifications contains an optional file name stored in the header of the gz file, sadly there are no way (as far as i know ) of getting that name in current java (1.6). 因为我有点同意“Michael Borgwardt”对他的回复,但这并不完全正确,gzip文件规范包含一个存储在gz文件头中的可选文件名,遗憾的是没有办法(据我所知)在当前的java(1.6)中获取该名称。 as seen in the implementation of the GZIPInputStream in the method getHeader in the openjdk 如在openjdk中的getHeader方法中执行GZIPInputStream所见

they skip reading the file name 他们跳过阅读文件名

// Skip optional file name
if ((flg & FNAME) == FNAME) {
      while (readUByte(in) != 0) ;
}

i have modified the class GZIPInputStream to get the optional filename out of the gzip archive(im not sure if i am allowed to do that) ( download the original version from here ), you only need to add a member String filename; 我已经修改了类GZIPInputStream来获取gzip存档中的可选文件名(我不确定我是否被允许这样做)( 从这里下载原始版本 ),你只需要添加一个成员字符串文件名; to the class, and modify the above code to be : 到类,并修改上面的代码为:

 // Skip optional file name
 if ((flg & FNAME) == FNAME) {
      filename= "";
      int _byte = 0;
      while ((_byte= readUByte(in)) != 0){
           filename += (char)_byte;
      }
 }

and it worked for me. 它对我有用。

Actually, the GZIP file format, using the multiple members, allows the original filename to be specified. 实际上,使用多个成员的GZIP文件格式允许指定原始文件名。 Including a member with the FLAG of FLAG.FNAME the name can be specified. 包含FLAG.FNAME FLAG的成员可以指定名称。 I do not see a way to do this in the java libraries though. 我没有在java库中看到这样做的方法。

http://www.gzip.org/zlib/rfc-gzip.html#specification http://www.gzip.org/zlib/rfc-gzip.html#specification

following the answers above, here is an example that creates a file "myTest.csv.gz" that contains a file "myTest.csv", notice that you can't change the internal file name, and you can't add more files into the gz file. 按照上面的答案,这里有一个例子,它创建一个包含文件“myTest.csv”的文件“myTest.csv.gz”,注意你不能更改内部文件名,也不能添加更多文件进入gz文件。

@Test
public void gzipFileName() throws Exception {
    File workingFile = new File( "target", "myTest.csv.gz" );
    GZIPOutputStream gzipOutputStream = new GZIPOutputStream( new FileOutputStream( workingFile ) );

    PrintWriter writer = new PrintWriter( gzipOutputStream );
    writer.println("hello,line,1");
    writer.println("hello,line,2");
    writer.close();

}

Apache Commons Compress offers two options for obtaining the filename: Apache Commons Compress提供了两个获取文件名的选项:

With metadata (Java 7+ sample code) 使用元数据(Java 7+示例代码)

try ( //
     GzipCompressorInputStream gcis = //
         new GzipCompressorInputStream( //
             new FileInputStream("a_gunzipped_file.gz") //
         ) //
    ) {
      String filename = gcis.getMetaData().getFilename();
    }

With "the convention" 随着“惯例”

 String filename = GzipUtils.getUnCompressedFilename("a_gunzipped_file.gz");

References 参考

Gzip is purely compression. Gzip纯粹是压缩。 There is no archive, it's just the file's data, compressed. 没有存档,只是压缩文件的数据。

The convention is for gzip to append .gz to the filename, and for gunzip to remove that extension. 惯例是让gzip将.gz附加到文件名,并使用gunzip删除该扩展名。 So, logfile.txt becomes logfile.txt.gz when compressed, and again logfile.txt when it's decompressed. 所以, logfile.txt变得logfile.txt.gz压缩时,再次logfile.txt时,它的解压缩。 If you rename the file, the name information is lost. 如果重命名该文件,则名称信息将丢失。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM