如何在C ++中解析tar文件

Question

我想要做的是下载一个包含多个目录的.tar文件，每个目录包含2个文件。 问题是我找不到一种方法来读取tar文件而不实际提取文件（使用tar ）。

完美的解决方案将是这样的：

#include <easytar>

Tarfile tar("somefile.tar");
std::string currentFile, currentFileName;
for(int i=0; i<tar.size(); i++){
  file = tar.getFileText(i);
  currentFileName = tar.getFileName(i);
  // do stuff with it
}

我可能不得不自己写这个，但任何想法都会受到赞赏..

Answer 1

经过一番工作后，我自己想出了这个。 tar文件规范实际上告诉你需要知道的一切。

首先，每个文件都以512字节的头开头，因此您可以使用char [512]或char *来指示它，指向较大char数组中的某个位置（例如，如果您将整个文件加载到一个数组中）。

标题看起来像这样：

location  size  field
0         100   File name
100       8     File mode
108       8     Owner's numeric user ID
116       8     Group's numeric user ID
124       12    File size in bytes
136       12    Last modification time in numeric Unix time format
148       8     Checksum for header block
156       1     Link indicator (file type)
157       100   Name of linked file

所以如果你想要文件名，你可以在这里用string filename(buffer[0], 100);抓住它string filename(buffer[0], 100); 。 文件名为空填充，因此您可以检查以确保至少有一个null，如果要节省空间，则不要使用大小。

现在我们想知道它是文件还是文件夹。 “链接指示符”字段包含此信息，因此：

// Note that we're comparing to ascii numbers, not ints
switch(buffer[156]){
    case '0': // intentionally dropping through
    case '\0':
        // normal file
        break;
    case '1':
        // hard link
        break;
    case '2':
        // symbolic link
        break;
    case '3':
        // device file/special file
        break;
    case '4':
        // block device
        break;
    case '5':
        // directory
        break;
    case '6':
        // named pipe
        break;
}

此时，我们已经掌握了有关目录的所有信息，但我们需要从普通文件中获取更多信息：实际文件内容。

文件的长度可以以两种不同的方式存储，可以是0或空格填充的以空字符结尾的八进制字符串，也可以是“通过设置最左边的字节的高位来指示的base-256编码一个数字字段“。

数字值使用ASCII数字以八进制数编码，前导零。 由于历史原因，应使用最终的NUL或空格字符。 因此，尽管保留了12个字节用于存储文件大小，但是只能存储11个八进制数字。 这使归档文件的最大文件大小为8千兆字节。 为了克服这个限制，2001年的star引入了base-256编码，通过设置数字字段最左边字节的高位来指示。 GNU-tar和BSD-tar遵循了这个想法。 此外，1988年第一个POSIX标准之前的tar版本用空格而不是零填充值。

这是你如何阅读八进制格式，但我还没有为base-256版本编写代码：

// in one function
int size_of_file = octal_string_to_int(&buffer[124], 11);

// elsewhere
int octal_string_to_int(char *current_char, unsigned int size){
    unsigned int output = 0;
    while(size > 0){
        output = output * 8 + *current_char - '0';
        current_char++;
        size--;
    }
    return output;
}

好的，现在我们拥有除实际文件内容之外的所有内容。 我们所要做的就是从tar文件中获取下一个size的数据字节，我们将获得文件内容：

// Get to the next block after the header ends
location += 512;
file_contents = new char[size];
memcpy(file_contents, &buffer[location], size);
// Go to the next block by rounding up to 512
// This isn't necessarily the most efficient way to do this,
// but it's the most obvious.
location += (int)ceil(size / 512.0)

Answer 2

你看过libtar了吗？

从fink包信息：

libtar-1.2-1：tar文件操作API libtar是一个用于操作POSIX tar文件的C库。 它处理向tar存档添加文件和从tar存档中提取文件。 libtar提供以下功能：
*灵活的API - 您可以操作单个文件或只是一次提取整个存档。
*允许用户指定的read（）和write（）函数，例如zlib的gzread（）和gzwrite（）。
*支持POSIX 1003.1-1990和GNU tar文件格式。

不是c ++ 本身，但你很容易链接到c ...

Answer 3

libarchive可以是解析tarball的开源库。 Libarchive可以从存档文件中读取每个文件而无需提取，也可以写入数据以形成新的存档文件。

如何在C ++中解析tar文件

问题描述

3 个解决方案

解决方案1
32 已采纳 2010-03-24 21:25:48

解决方案2
12 2010-03-24 02:55:38

解决方案3
4 2016-01-13 15:29:49

如何在C ++中解析tar文件

问题描述

3 个解决方案

解决方案1 32 已采纳 2010-03-24 21:25:48

解决方案2 12 2010-03-24 02:55:38

解决方案3 4 2016-01-13 15:29:49

解决方案1
32 已采纳 2010-03-24 21:25:48

解决方案2
12 2010-03-24 02:55:38

解决方案3
4 2016-01-13 15:29:49