简体   繁体   English

如何在C ++中解析tar文件

[英]How to parse a tar file in C++

What I want to do is download a .tar file with multiple directories with 2 files each. 我想要做的是下载一个包含多个目录的.tar文件,每个目录包含2个文件。 The problem is I can't find a way to read the tar file without actually extracting the files (using tar ). 问题是我找不到一种方法来读取tar文件而不实际提取文件(使用tar )。

The perfect solution would be something like: 完美的解决方案将是这样的:

#include <easytar>

Tarfile tar("somefile.tar");
std::string currentFile, currentFileName;
for(int i=0; i<tar.size(); i++){
  file = tar.getFileText(i);
  currentFileName = tar.getFileName(i);
  // do stuff with it
}

I'm probably going to have to write this myself, but any ideas would be appreciated.. 我可能不得不自己写这个,但任何想法都会受到赞赏..

I figured this out myself after a bit of work. 经过一番工作后,我自己想出了这个。 The tar file spec actually tells you everything you need to know. tar文件规范实际上告诉你需要知道的一切。

First off, every file starts with a 512 byte header, so you can represent it with a char[512] or a char* pointing at somewhere in your larger char array (if you have the entire file loaded into one array for example). 首先,每个文件都以512字节的头开头,因此您可以使用char [512]或char *来指示它,指向较大char数组中的某个位置(例如,如果您将整个文件加载到一个数组中)。

The header looks like this: 标题看起来像这样:

location  size  field
0         100   File name
100       8     File mode
108       8     Owner's numeric user ID
116       8     Group's numeric user ID
124       12    File size in bytes
136       12    Last modification time in numeric Unix time format
148       8     Checksum for header block
156       1     Link indicator (file type)
157       100   Name of linked file

So if you want the file name, you grab it right here with string filename(buffer[0], 100); 所以如果你想要文件名,你可以在这里用string filename(buffer[0], 100);抓住它string filename(buffer[0], 100); . The file name is null padded, so you could do a check to make sure there's at least one null and then leave off the size if you want to save space. 文件名为空填充,因此您可以检查以确保至少有一个null,如果要节省空间,则不要使用大小。

Now we want to know if it's a file or a folder. 现在我们想知道它是文件还是文件夹。 The "link indicator" field has this information, so: “链接指示符”字段包含此信息,因此:

// Note that we're comparing to ascii numbers, not ints
switch(buffer[156]){
    case '0': // intentionally dropping through
    case '\0':
        // normal file
        break;
    case '1':
        // hard link
        break;
    case '2':
        // symbolic link
        break;
    case '3':
        // device file/special file
        break;
    case '4':
        // block device
        break;
    case '5':
        // directory
        break;
    case '6':
        // named pipe
        break;
}

At this point, we already have all of the information we need about directories, but we need one more thing from normal files: the actual file contents. 此时,我们已经掌握了有关目录的所有信息,但我们需要从普通文件中获取更多信息:实际文件内容。

The length of the file can be stored in two different ways, either as a 0-or-space-padded null-terminated octal string, or "a base-256 coding that is indicated by setting the high-order bit of the leftmost byte of a numeric field". 文件的长度可以以两种不同的方式存储,可以是0或空格填充的以空字符结尾的八进制字符串,也可以是“通过设置最左边的字节的高位来指示的base-256编码一个数字字段“。

Numeric values are encoded in octal numbers using ASCII digits, with leading zeroes. 数字值使用ASCII数字以八进制数编码,前导零。 For historical reasons, a final NUL or space character should be used. 由于历史原因,应使用最终的NUL或空格字符。 Thus although there are 12 bytes reserved for storing the file size, only 11 octal digits can be stored. 因此,尽管保留了12个字节用于存储文件大小,但是只能存储11个八进制数字。 This gives a maximum file size of 8 gigabytes on archived files. 这使归档文件的最大文件大小为8千兆字节。 To overcome this limitation, star in 2001 introduced a base-256 coding that is indicated by setting the high-order bit of the leftmost byte of a numeric field. 为了克服这个限制,2001年的star引入了base-256编码,通过设置数字字段最左边字节的高位来指示。 GNU-tar and BSD-tar followed this idea. GNU-tar和BSD-tar遵循了这个想法。 Additionally, versions of tar from before the first POSIX standard from 1988 pad the values with spaces instead of zeroes. 此外,1988年第一个POSIX标准之前的tar版本用空格而不是零填充值。

Here's how you would read the octal format, but I haven't written code for the base-256 version: 这是你如何阅读八进制格式,但我还没有为base-256版本编写代码:

// in one function
int size_of_file = octal_string_to_int(&buffer[124], 11);

// elsewhere
int octal_string_to_int(char *current_char, unsigned int size){
    unsigned int output = 0;
    while(size > 0){
        output = output * 8 + *current_char - '0';
        current_char++;
        size--;
    }
    return output;
}

Ok, so now we have everything except the actual file contents. 好的,现在我们拥有除实际文件内容之外的所有内容。 All we have to do is grab the next size bytes of data from the tar file and we'll have our file contents: 我们所要做的就是从tar文件中获取下一个size的数据字节,我们将获得文件内容:

// Get to the next block after the header ends
location += 512;
file_contents = new char[size];
memcpy(file_contents, &buffer[location], size);
// Go to the next block by rounding up to 512
// This isn't necessarily the most efficient way to do this,
// but it's the most obvious.
location += (int)ceil(size / 512.0)

Have you looked at libtar ? 你看过libtar了吗?

From the fink package info: 从fink包信息:

libtar-1.2-1: Tar file manipulation API libtar is a C library for manipulating POSIX tar files. libtar-1.2-1:tar文件操作API libtar是一个用于操作POSIX tar文件的C库。 It handles adding and extracting files to/from a tar archive. 它处理向tar存档添加文件和从tar存档中提取文件。 libtar offers the following features: libtar提供以下功能:
* Flexible API - you can manipulate individual files or just extract a whole archive at once. *灵活的API - 您可以操作单个文件或只是一次提取整个存档。
* Allows user-specified read() and write() functions, such as zlib's gzread() and gzwrite(). *允许用户指定的read()和write()函数,例如zlib的gzread()和gzwrite()。
* Supports both POSIX 1003.1-1990 and GNU tar file formats. *支持POSIX 1003.1-1990和GNU tar文件格式。

Not c++ per se , but you can link to c pretty easily... 不是c ++ 本身 ,但你很容易链接到c ...

libarchive can be the open source library to parse the tarball. libarchive可以是解析tarball的开源库。 Libarchive can read each files from an archive file without extraction, and also it can write data to form a new archive file. Libarchive可以从存档文件中读取每个文件而无需提取,也可以写入数据以形成新的存档文件。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM