简体   繁体   English

从 CPIO 生成的文件中填充了额外的字节

[英]Extra bytes are padding in the generated file from CPIO

I have a list of files in a directory and I want to create one archive format file.我有一个目录中的文件列表,我想创建一个存档格式文件。 I used CPIO to create the file as我使用 CPIO 将文件创建为

ls |  cpio -ov -H crc > demo.cpio

and I have a cpio structure like this我有一个像这样的 cpio 结构

struct cpio_newc_header {
        char    c_magic[6];
        char    c_ino[8];
        char    c_mode[8];
        char    c_uid[8];
        char    c_gid[8];
        char    c_nlink[8];
        char    c_mtime[8];
        char    c_filesize[8];
        char    c_devmajor[8];
        char    c_devminor[8];
        char    c_rdevmajor[8];
        char    c_rdevminor[8];
        char    c_namesize[8];
        char    c_check[8];
};

I can able to fetch the metadata, pathname, file data in the header by using the c_filesize,c_namesize.I can fetch the file data based on c_filesize,but after fetching the file data there some extra bits are padded,ie after the file data and before the next header.我可以通过使用 c_filesize,c_namesize 来获取头文件中的元数据、路径名、文件数据。我可以根据 c_filesize 获取文件数据,但是在获取文件数据后,填充了一些额外的位,即在文件数据之后在下一个标题之前。

00000230: 6e63 6965 7322 3a5b 5d0d 0a7d 0d0a 0000  ncies":[]..}....
00000240: 3037 3037 3032 3030 3636 4246 3838 3030  0707020066BF8800

here we can observe after the '}' some extra bytes are padded.在这里我们可以观察到在 '}' 之后填充了一些额外的字节。 I taught its rounding by the multiples of four but I observed some other data which is not multiples of four我用四的倍数教了它的四舍五入,但我观察到了一些不是四的倍数的其他数据

00000450: 2066 6f72 2063 7279 7074 6f20 7665 7269  for datapo veri
00000460: 6669 6361 7469 6f6e 0a00 0000 3037 3037  fication....0707

Why the extra bytes are padding.Can we avoid while doing CPIO?为什么额外的字节被填充。我们在做 CPIO 时可以避免吗?

From the manpage of cpio (section New ASCII Format):从 cpio 的联机帮助页(新 ASCII 格式部分):

The pathname is followed by NUL bytes so that the total size of the fixed header plus pathname is a multiple of four.路径名后跟 NUL 字节,因此固定头加上路径名的总大小是 4 的倍数。 Likewise, the file data is padded to a multiple of four bytes.同样,文件数据被填充为四字节的倍数。 Note that this format supports only 4 gigabyte files (unlike the older ASCII format, which supports 8 gigabyte files).请注意,此格式仅支持 4 GB 的文件(与旧的 ASCII 格式不同,后者支持 8 GB 的文件)。

See also man 5 cpio另见man 5 cpio

In your second example, it is also padded to be 4-bytes-aligned:在你的第二个例子中,它也被填充为 4 字节对齐:

00000460: 6669 6361 7469 6f6e 0a00 0000 3037 3037  fication....0707

You see, the data ends at 0x468 and three extra zero bytes for padding are added, so the next chunk can start at 0x46c .您会看到,数据以0x468结尾,并且添加了三个额外的零字节用于填充,因此下一个块可以从0x46c开始。

This padding is probably performed to avoid unaligned access to header fields after reading it into memory.执行此填充可能是为了避免在将标头字段读入内存后对标头字段的未对齐访问。 It is part of the specification, so there is no option to avoid it.它是规范的一部分,所以没有办法避免它。

But it's easy to calculate it.但计算起来很容易。 If the offset x is the next byte after the file end, then the next header begins at offset如果偏移量x是文件结束后的下一个字节,则下一个头从偏移量开始

int nextheader = (x+3)&~3;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM