简体   繁体   English

如何将big-endian结构转换为小端结构?

[英]How do I convert a big-endian struct to a little endian-struct?

I have a binary file that was created on a unix machine. 我有一个在unix机器上创建的二进制文件。 It's just a bunch of records written one after another. 这只是一堆接一个写的记录。 The record is defined something like this: 记录的定义如下:

struct RECORD {
  UINT32 foo;
  UINT32 bar;
  CHAR fooword[11];
  CHAR barword[11];
  UNIT16 baz;
}

I am trying to figure out how I would read and interpret this data on a Windows machine. 我试图弄清楚如何在Windows机器上阅读和解释这些数据。 I have something like this: 我有这样的事情:

fstream f;
f.open("file.bin", ios::in | ios::binary);

RECORD r;

f.read((char*)&detail, sizeof(RECORD));

cout << "fooword = " << r.fooword << endl;

I get a bunch of data, but it's not the data I expect. 我得到了一堆数据,但这不是我期望的数据。 I'm suspect that my problem has to do with the endian difference of the machines, so I've come to ask about that. 我怀疑我的问题与机器的endian区别有关,所以我来问这个问题。

I understand that multiple bytes will be stored in little-endian on windows and big-endian in a unix environment, and I get that. 我知道多个字节将存储在windows中的little-endian和unix环境中的big-endian中,我明白了。 For two bytes, 0x1234 on windows will be 0x3412 on a unix system. 对于两个字节,Windows上的0x1234在unix系统上将为0x3412。

Does endianness affect the byte order of the struct as a whole, or of each individual member of the struct? endianness会影响整个结构的字节顺序,还会影响结构的每个成员的字节顺序? What approaches would I take to convert a struct created on a unix system to one that has the same data on a windows system? 我将采用什么方法将在unix系统上创建的结构转换为在Windows系统上具有相同数据的结构? Any links that are more in depth than the byte order of a couple bytes would be great, too! 任何比几个字节的字节顺序更深入的链接也会很棒!

As well as the endian, you need to be aware of padding differences between the two platforms. 与endian一样,您需要了解两个平台之间的填充差异。 Particularly if you have odd length char arrays and 16 bit values, you may well find different numbers of pad bytes between some elements. 特别是如果您有奇数长度的char数组和16位值,您可能会在某些元素之间找到不同数量的填充字节。

Edit: if the structure was written out with no packing, then it should be fairly straightforward. 编辑:如果结构没有打包,那么它应该相当简单。 Something like this (untested) code should do the job: 像这样(未经测试的)代码应该做的工作:

// Functions to swap the endian of 16 and 32 bit values

inline void SwapEndian(UINT16 &val)
{
    val = (val<<8) | (val>>8);
}

inline void SwapEndian(UINT32 &val)
{
    val = (val<<24) | ((val<<8) & 0x00ff0000) |
          ((val>>8) & 0x0000ff00) | (val>>24);
}

Then, once you've loaded the struct, just swap each element: 然后,一旦你加载了结构,只需交换每个元素:

SwapEndian(r.foo);
SwapEndian(r.bar);
SwapEndian(r.baz);

Actually, endianness is a property of the underlying hardware, not the OS. 实际上,字节顺序是底层硬件的属性,而不是操作系统。

The best solution is to convert to a standard when writing the data -- Google for "network byte order" and you should find the methods to do this. 最好的解决方案是在编写数据时转换为标准 - Google用于“网络字节顺序”,您应该找到执行此操作的方法。

Edit: here's the link: http://www.gnu.org/software/hello/manual/libc/Byte-Order.html 编辑:这是链接: http//www.gnu.org/software/hello/manual/libc/Byte-Order.html

Don't read directly into struct from a file! 不要直接从文件中读取结构! The packing might be different, you have to fiddle with pragma pack or similar compiler specific constructs. 打包可能不同,你必须摆弄pragma pack或类似的编译器特定结构。 Too unreliable. 太不可靠了。 A lot of programmers get away with this since their code isn't compiled in wide number of architectures and systems, but that doesn't mean it's OK thing to do! 很多程序员都逃避了这个问题,因为他们的代码并没有在很多架构和系统中编译,但这并不意味着它可以做!

A good alternative approach is to read the header, whatever, into a buffer and parse from three to avoid the I/O overhead in atomic operations like reading a unsigned 32 bit integer! 一个很好的替代方法是将标题读入缓冲区并从三个语法中解析,以避免原子操作中的I / O开销,例如读取无符号的32位整数!

char buffer[32];
char* temp = buffer;  

f.read(buffer, 32);  

RECORD rec;
rec.foo = parse_uint32(temp); temp += 4;
rec.bar = parse_uint32(temp); temp += 4;
memcpy(&rec.fooword, temp, 11); temp += 11;
memcpy(%red.barword, temp, 11); temp += 11;
rec.baz = parse_uint16(temp); temp += 2;

The declaration of parse_uint32 would look like this: parse_uint32的声明如下所示:

uint32 parse_uint32(char* buffer)
{
  uint32 x;
  // ...
  return x;
}

This is a very simple abstraction, it doesn't cost any extra in practise to update the pointer as well: 这是一个非常简单的抽象,在实践中也不需要额外更新指针:

uint32 parse_uint32(char*& buffer)
{
  uint32 x;
  // ...
  buffer += 4;
  return x;
}

The later form allows cleaner code for parsing the buffer; 后一种形式允许更清晰的代码来解析缓冲区; the pointer is automatically updated when you parse from the input. 从输入解析时,指针会自动更新。

Likewise, memcpy could have a helper, something like: 同样,memcpy可以有一个帮手,如:

void parse_copy(void* dest, char*& buffer, size_t size)
{
  memcpy(dest, buffer, size);
  buffer += size;
}

The beauty of this kind of arrangement is that you can have namespace "little_endian" and "big_endian", then you can do this in your code: 这种安排的好处是你可以拥有命名空间“little_endian”和“big_endian”,然后你可以在你的代码中执行此操作:

using little_endian;
// do your parsing for little_endian input stream here..

Easy to switch endianess for the same code, though, rarely needed feature.. file-formats usually have a fixed endianess anyway. 但是,很容易为相同的代码切换endianess,很少需要的功能..文件格式通常具有固定的endianess无论如何。

DO NOT abstract this into class with virtual methods; 不要用虚拟方法将其抽象为类; would just add overhead, but feel free to if so inclined: 只会增加开销,但如果愿意,请随意:

little_endian_reader reader(data, size);
uint32 x = reader.read_uint32();
uint32 y = reader.read_uint32();

The reader object would obviously just be a thin wrapper around pointer. 读者对象显然只是指针的薄包装。 The size parameter would be for error checking, if any. size参数用于错误检查(如果有)。 Not really mandatory for the interface per-se. 对于接口本身并不是强制要求的。

Notice how the choise of endianess here was done at COMPILATION TIME (since we create little_endian_reader object), so we invoke the virtual method overhead for no particularly good reason, so I wouldn't go with this approach. 注意这里的endianess选择是如何在COMPILATION TIME完成的(因为我们创建了little_endian_reader对象),所以我们调用虚拟方法开销没有特别好的理由,所以我不会采用这种方法。 ;-) ;-)

At this stage there is no real reason to keep the "fileformat struct" around as-is, you can organize the data to your liking and not necessarily read it into any specific struct at all; 在这个阶段,没有任何理由将“fileformat结构”保持原样,您可以根据自己的喜好组织数据,而不必将其读入任何特定的结构中; after all, it's just data. 毕竟,这只是数据。 When you read files like images, you don't really need the header around.. you should have your image container which is same for all file types, so the code to read a specific format should just read the file, interpret and reformat the data & store the payload. 当您读取图像等文件时,您实际上并不需要标题...您应该拥有对所有文件类型都相同的图像容器,因此读取特定格式的代码应该只读取文件,解释并重新格式化数据并存储有效负载。 =) =)

I mean, does this look complicated? 我的意思是,这看起来很复杂吗?

uint32 xsize = buffer.read<uint32>();
uint32 ysize = buffer.read<uint32>();
float aspect = buffer.read<float>();    

The code can look that nice, and be a really low-overhead! 代码看起来很不错,而且开销很低! If the endianess is same for file and architecture the code is compiled for, the innerloop can look like this: 如果编译代码的文件和体系结构的字节顺序相同,则内部循环可能如下所示:

uint32 value = *reinterpret_cast<uint32*>)(ptr); ptr += 4;
return value;

That might be illegal on some architectures, so that optimization might be a Bad Idea, and use slower, but more robust approach: 在某些体系结构上这可能是非法的,因此优化可能是一个坏主意,并使用更慢但更强大的方法:

uint32 value = ptr[0] | (static_cast<uint32>(ptr[1]) << 8) | ...; ptr += 4;
return value;

On a x86 that can compile into bswap or mov, which is reasonably low-overhead if the method is inlined; 在x86上可以编译成bswap或mov,如果方法是内联的,则开销相当低; the compiler would insert "move" node into the intermediate code, nothing else, which is fairly efficient. 编译器会将“移动”节点插入到中间代码中,没有别的,这是相当有效的。 If alignment is a problem the full read-shift-or sequence might get generated, outch, but still not too shabby. 如果对齐是一个问题,那么完整的读取 - 移位或序列可能会生成,超出,但仍然不会太破旧。 Compare-branch could allow the optimization, if test the address LSB's and see if can use the fast or slow version of the parsing. 比较分支可以允许优化,如果测试地址LSB并且看是否可以使用快速或慢速版本的解析。 But this would mean penalty for the test in every read. 但这意味着每次阅读都会对测试造成惩罚。 Might not be worth the effort. 可能不值得努力。

Oh, right, we are reading HEADERS and stuff, I don't think that is a bottleneck in too many applications. 哦,是的,我们正在读HEADERS和东西,我不认为这是太多应用程序的瓶颈。 If some codec is doing some really TIGHT innerloop, again, reading into a temporary buffer and decoding from there is well-adviced. 如果某些编解码器正在做一些非常紧密的内环,再次,读入一个临时缓冲区并从那里进行解码是很好的建议。 Same principle.. no one reads byte-at-time from file when processing a large volume of data. 同样的原则..在处理大量数据时,没有人从文件中按字节读取。 Well, actually, I seen that kind of code very often and the usual reply to "why you do it" is that the file systems do block reads and that the bytes come from memory anyway, true, but they go through a deep call stack which is high-overhead for getting a few bytes! 好吧,实际上,我经常看到那种代码并且通常回复“你为什么这样做”是文件系统阻止读取并且字节来自内存无论如何,是真的,但它们通过深度调用堆栈这是获得几个字节的高开销!

Still, write the parser code once and use zillion times -> epic win. 仍然,编写解析器代码一次并使用数万次 - >史诗般的胜利。

Reading directly into struct from a file: DON'T DO IT FOLKS! 从文件直接读取结构:不要做它们!

It affects each member independently, not the whole struct . 它独立地影响每个成员,而不是整个struct Also, it does not affect things like arrays. 此外,它不会影响数组之类的东西。 For instance, it just makes bytes in an int s stored in reverse order. 例如,它只是以相反的顺序存储int的字节。

PS. PS。 That said, there could be a machine with weird endianness. 也就是说,可能会有一台具有奇怪字节序的机器。 What I just said applies to most used machines (x86, ARM, PowerPC, SPARC). 我刚才所说的适用于大多数二手机器(x86,ARM,PowerPC,SPARC)。

You have to correct the endianess of each member of more than one byte, individually. 您必须单独更正多个字节的每个成员的字节顺序。 Strings do not need to be converted (fooword and barword), as they can be seen as sequences of bytes. 字符串不需要转换(fooword和barword),因为它们可以被视为字节序列。

However, you must take care of another problem: aligmenent of the members in your struct. 但是,您必须处理另一个问题:结构中成员的相互关联。 Basically, you must check if sizeof(RECORD) is the same on both unix and windows code. 基本上,您必须检查unix和windows代码上的sizeof(RECORD)是否相同。 Compilers usually provide pragmas to define the aligment you want (for example, #pragma pack). 编译器通常提供编译指示来定义所需的对象(例如,#pragma pack)。

You also have to consider alignment differences between the two compilers. 您还必须考虑两个编译器之间的对齐差异。 Each compiler is allowed to insert padding between members in a structure the best suits the architecture. 允许每个编译器在最适合该体系结构的结构中的成员之间插入填充。 So you really need to know: 所以你真的需要知道:

  • How the UNIX prog writes to the file UNIX编程如何写入文件
  • If it is a binary copy of the object the exact layout of the structure. 如果它是对象的二进制副本,则结构的确切布局。
  • If it is a binary copy what the endian-ness of the source architecture. 如果它是二进制副本,那么源代码体系结构的字节顺序是什么。

This is why most programs (That I have seen (that need to be platform neutral)) serialize the data as a text stream that can be easily read by the standard iostreams. 这就是为什么大多数程序(我已经看到(需要平台中立))将数据序列化为文本流,可以通过标准的iostream轻松读取。

I like to implement a SwapBytes method for each data type that needs swapping, like this: 我喜欢为每个需要交换的数据类型实现SwapBytes方法,如下所示:

inline u_int ByteSwap(u_int in)
{
    u_int out;
    char *indata = (char *)&in;
    char *outdata = (char *)&out;
    outdata[0] = indata[3] ;
    outdata[3] = indata[0] ;

    outdata[1] = indata[2] ;
    outdata[2] = indata[1] ;
    return out;
}

inline u_short ByteSwap(u_short in)
{
    u_short out;
    char *indata = (char *)&in;
    char *outdata = (char *)&out;
    outdata[0] = indata[1] ;
    outdata[1] = indata[0] ;
    return out;
}

Then I add a function to the structure that needs swapping, like this: 然后我向需要交换的结构添加一个函数,如下所示:

struct RECORD {
  UINT32 foo;
  UINT32 bar;
  CHAR fooword[11];
  CHAR barword[11];
  UNIT16 baz;
  void SwapBytes()
  {
    foo = ByteSwap(foo);
    bar = ByteSwap(bar);
    baz = ByteSwap(baz);
  }
}

Then you can modify your code that reads (or writes) the structure like this: 然后,您可以修改读取(或写入)结构的代码,如下所示:

fstream f;
f.open("file.bin", ios::in | ios::binary);

RECORD r;

f.read((char*)&detail, sizeof(RECORD));
r.SwapBytes();

cout << "fooword = " << r.fooword << endl;

To support different platforms you just need to have a platform specific implementation of each ByteSwap overload. 要支持不同的平台,您只需要具有每个ByteSwap重载的特定于平台的实现。

Something like this should work: 这样的事情应该有效:

#include <algorithm>

struct RECORD {
    UINT32 foo;
    UINT32 bar;
    CHAR fooword[11];
    CHAR barword[11];
    UINT16 baz;
}

void ReverseBytes( void *start, int size )
{
    char *beg = start;
    char *end = beg + size;

    std::reverse( beg, end );
}

int main() {
    fstream f;
    f.open( "file.bin", ios::in | ios::binary );

    // for each entry {
    RECORD r;
    f.read( (char *)&r, sizeof( RECORD ) );
    ReverseBytes( r.foo, sizeof( UINT32 ) );
    ReverseBytes( r.bar, sizeof( UINT32 ) );
    ReverseBytes( r.baz, sizeof( UINT16 )
    // }

    return 0;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM