简体   繁体   English

在Java中复制C struct padding

[英]Replicating C struct padding in Java

According to here , the C compiler will pad out values when writing a structure to a binary file. 根据这里 ,C编译器将结构写入二进制文件时填充值。 As the example in the link says, when writing a struct like this: 正如链接中的示例所示,在编写这样的结构时:

struct {
 char c;
 int i;
} a;

to a binary file, the compiler will usually leave an unnamed, unused hole between the char and int fields, to ensure that the int field is properly aligned. 对于二进制文件,编译器通常会在char和int字段之间留下未命名的未使用的孔,以确保int字段正确对齐。

How could I to create an exact replica of the binary output file (generated in C), using a different language (in my case, Java)? 我怎样才能使用不同的语言(在我的例子中,Java)创建二进制输出文件(在C中生成)的精确副本?

Is there an automatic way to apply C padding in Java output? 是否有自动方式在Java输出中应用C填充? Or do I have to go through compiler documentation to see how it works (the compiler is g++ by the way). 或者我是否必须通过编译器文档来了解它是如何工作的(顺便说一句,编译器是g ++)。

Don't do this, it is brittle and will lead to alignment and endianness bugs. 不要这样做,它很脆,会导致对齐和字节序错误。

For external data it is much better to explicitly define the format in terms of bytes and write explicit functions to convert between internal and external format, using shift and masks (not union!). 对于外部数据,使用shift和mask(不是union!)来明确定义字节格式并编写显式函数以在内部和外部格式之间进行转换要好得多。

This is true not only when writing to files, but also in memory. 这不仅适用于写入文件,也适用于内存。 It is the fact that the struct is padded in memory, that leads to the padding showing up in the file, if the struct is written out byte-by-byte. 事实上,如果结构是逐字节写出的,那么结构被填充在内存中,导致填充显示在文件中。

It is in general very hard to replicate with certainty the exact padding scheme, although I guess some heuristics would get you quite far. 通常很难确切地复制确切的填充方案,尽管我猜一些启发式方法会让你走得很远。 It helps if you have the struct declaration, for analysis. 如果你有结构声明,它有助于分析。

Typically, fields larger than one char will be aligned so that their starting offset inside the structure is a multiple of their size. 通常,大于一个字符的字段将对齐,以便它们在结构内的起始偏移量是其大小的倍数。 This means short s will generally be on even offsets (divisible by 2, assuming sizeof (short) == 2 ), while double s will be on offsets divisible by 8, and so on. 这意味着short s通常在偶数偏移上(可被2整除,假设sizeof (short) == 2 ),而double s将在可被8整除的偏移上,依此类推。

UPDATE : It is for reasons like this (and also reasons having to do with endianness) that it is generally a bad idea to dump whole structs out to files. 更新 :出于这样的原因(以及与endianness有关的原因)将整个结构转储到文件通常是个坏主意。 It's better to do it field-by-field, like so: 最好逐场进行,如下所示:

put_char(out, a.c);
put_int(out, a.i);

Assuming the put -functions only write the bytes needed for the value, this will emit a padding-less version of the struct to the file, solving the problem. 假设put -functions只写入值所需的字节,这将向文件发出一个无填充版本的结构,解决了这个问题。 It is also possible to ensure a proper, known, byte-ordering by writing these functions accordingly. 通过相应地编写这些函数,还可以确保正确的,已知的字节排序。

Is there an automatic way to apply C padding in Java output? 是否有自动方式在Java输出中应用C填充? Or do I have to go through compiler documentation to see how it works (the compiler is g++ by the way). 或者我是否必须通过编译器文档来了解它是如何工作的(顺便说一句,编译器是g ++)。

Neither. 都不是。 Instead, you explicitly specify a data/communication format and implement that specification, rather than relying on implementation details of the C compiler. 相反,您明确指定数据/通信格式并实现该规范,而不是依赖于C编译器的实现细节。 You won't even get the same output from different C compilers. 您甚至不会从不同的C编译器获得相同的输出。

For interoperability, look at the ByteBuffer class. 有关互操作性,请查看ByteBuffer类。

Essentially, you create a buffer of a certain size, put() variables of different types at different positions, and then call array() at the end to retrieve the "raw" data representation: 基本上,您创建一个特定大小的缓冲区,将不同类型的put()变量放在不同的位置,然后在末尾调用array()以检索“原始”数据表示:

ByteBuffer bb = ByteBuffer.allocate(8);
bb.order(ByteOrder.LITTLE_ENDIAN);
bb.put(0, someChar);
bb.put(4, someInteger);
byte[] rawBytes = bb.array();

But it's up to you to work out where to put padding-- ie how many bytes to skip between positions. 但是你需要弄清楚填充的位置 - 即在位置之间跳过多少字节。

For reading data written from C, then you generally wrap() a ByteBuffer around some byte array that you've read from a file. 为了读取从C写入的数据,您通常将() ByteBuffer 包装在您从文件中读取的某个字节数组周围。

In case it's helpful, I've written more on ByteBuffer . 如果它有用,我已经在ByteBuffer上写了更多。

A handy way of reading/writing C structs in Java is to use the javolution Struct class (see http://www.javolution.org ). 在Java中读取/编写C结构的一种方便方法是使用javolution Struct类(参见http://www.javolution.org )。 This won't help you with automatically padding/aligning your data, but it does make working with raw data held in a ByteBuffer much more convenient. 这对于自动填充/对齐数据没有帮助,但它确实可以更方便地处理ByteBuffer中保存的原始数据。 If you're not familiar with javolution, it's well worth a look as there's lots of other cool stuff in there too. 如果你不熟悉javolution,那么值得一看,因为那里还有很多其他很酷的东西。

This hole is configurable, compiler has switches to align structs by 1/2/4/8 bytes. 这个漏洞是可配置的,编译器有一些开关,用于将结构对齐1/2/4/8字节。

So the first question is: Which alignment exactly do you want to simulate? 所以第一个问题是:你想要模拟哪个对齐?

With Java, the size of data types are defined by the language specification. 使用Java,数据类型的大小由语言规范定义。 For example, a byte type is 1 byte, short is 2 bytes, and so on. 例如, byte类型是1字节, short字节是2字节,依此类推。 This is unlike C, where the size of each type is architecture-dependent. 这与C不同,其中每种类型的大小取决于体系结构。

Therefore, it would be important to know how the binary file is formatted in order to be able to read the file into Java. 因此,知道如何格式化二进制文件以便能够将文件读入Java是很重要的。

It may be necessary to take steps in order to be certain that fields are a specific size, to account for differences in the compiler or architecture. 为了确定字段是特定大小,可能需要采取措施来解决编译器或体系结构的差异。 The mention of alignment seem to suggest that the output file will depend on the architecture. 提及对齐似乎表明输出文件将取决于体系结构。

you could try preon : 你可以尝试preon

Preon is a java library for building codecs for bitstream-compressed data in a declarative (annotation based) way. Preon是一个java库,用于以声明(基于注释)的方式构建用于比特流压缩数据的编解码器。 Think JAXB or Hibernate, but then for binary encoded data. 想想JAXB或Hibernate,然后是二进制编码数据。

it can handle Big/Little endian binary data, alignment (padding) and various numeric types along other features. 它可以处理大/小端二进制数据,对齐(填充)和其他功能的各种数字类型。 It is a very nice library, I like it very much 这是一个非常好的图书馆,我非常喜欢它

my 0.02$ 我的0.02美元

我强烈推荐协议缓冲区来解决这个问题。

As I understand it, you're saying that you don't control the output of the C program. 据我了解,你说你不控制C程序的输出。 You have to take it as given. 你必须把它当作给定的。

So do you have to read this file for some specific set of structures, or do you have to solve this in a general case? 那么你是否必须阅读这个文件以获取一些特定的结构,或者你必须在一般情况下解决这个问题吗? I mean, is the problem that someone said, "Here's the file created by program X, you have to read it in Java"? 我的意思是,有人说,“这是由程序X创建的文件,你必须用Java读取它”的问题吗? Or do they expect your Java program to read the C source code, find the structure definition, and then read it in Java? 或者他们是否希望您的Java程序读取C源代码,找到结构定义,然后用Java读取它?

If you've got a specific file to read, the problem isn't really very difficult. 如果你有一个特定的文件要阅读,问题并不是很困难。 Either by reviewing the C compiler specifications or by studying example files, figure out where the padding is. 通过查看C编译器规范或研究示例文件,找出填充的位置。 Then on the Java side, read the file as a stream of bytes, and build the values you know are coming. 然后在Java端,将文件作为字节流读取,并构建您知道的值。 Basically I'd write a set of functions to read the required number of bytes from an InputStream and turn them into the appropriate data type. 基本上我会编写一组函数来从InputStream中读取所需的字节数,并将它们转换为适当的数据类型。 Like: 喜欢:

int readInt(InputStream is,int len)
  throws PrematureEndOfDataException
{
  int n=0;
  while (len-->0)
  {
    int i=is.read();
    if (i==-1)
      throw new PrematureEndOfDataException();
    byte b=(byte) i;
    n=(n<<8)+b;
  }
  return n;
}

您可以更改c侧的打包以确保不使用填充,或者您可以在十六进制编辑器中查看生成的文件格式,以允许您在Java中编写一个忽略填充字节的解析器。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM