简体   繁体   English

复制期间的字节交换

[英]Byte swap during copy

I need to efficiently swap the byte order of an array during copying into another array. 我需要复制到另一个数组期间有效地交换数组的字节顺序。

The source array is of a certain type; 源数组属于某种类型; char, short or int so the byte swapping required is unambiguous and will be according to that type. char,short或int所以所需的字节交换是明确的,并且将根据该类型。

My plan is to do this very simply with a multi-pass byte-wise copy (2 for short, 4 for int, ...). 我的计划是使用多遍字节副本(简称2,int为4,...)来完成这项工作。 However are there any pre-existing "memcpy_swap_16/32/64" functions or libraries? 但是有没有预先存在的“memcpy_swap_16 / 32/64”函数或库? Perhaps in image processing for BGR/RGB image processing. 也许在用于BGR / RGB图像处理的图像处理中。

EDIT 编辑

I know how to swap the bytes of individual values, that is not the problem. 我知道如何交换单个值的字节,这不是问题。 I want to do this process during a copy that I am going to perform anyway . 我想在我要执行的副本执行此过程。

For example, if I have an array or little endian 4-byte integers I can do they swap by performing 4 bytewise copies with initial offsets of 0, 1, 2 and 3 with a stride of 4. But there may be a better way, perhaps even reading each 4-byte integer individually and using the byte-swap intrinsics _byteswap_ushort, _byteswap_ulong and _byteswap_uint64 would be faster. 例如,如果我有一个数组或小端4字节整数,我可以通过执行4个字节副本进行交换,初始偏移量为0,1,2和3,步幅为4.但是可能有更好的方法,甚至可能单独读取每个4字节整数并使用字节交换内在函数_byteswap_ushort,_byteswap_ulong和_byteswap_uint64会更快。 But I suspect there must be existing functions that do this type of processing. 但我怀疑必须有现有的功能来进行这种处理。

EDIT 2 编辑2

Just found this, which may be a useful basis for SSE, though its true that memory bandwidth probably makes it a waste of time. 刚发现这个,这可能是SSE的一个有用的基础,尽管内存带宽可能会浪费时间。

Fast vectorized conversion from RGB to BGRA 从RGB到BGRA的快速矢量化转换

Unix systems have a swab function that does what you want for 16-bit arrays. Unix系统有一个swab函数,可以为16位数组提供所需的功能。 It's probably optimized, but I'm not sure. 它可能已经过优化,但我不确定。 Note that modern gcc will generate extremely efficient code if you just write the naive byte swap code: 请注意,如果您只是编写天真的字节交换代码,现代gcc将生成非常高效的代码:

uint32_t x, y;
y = (x<<24) | (x<<8 & 0xff0000) | (x>>8 & 0xff00) | (x>>24);

ie it will use the bswap instruction on i486+. 即它将在i486 +上使用bswap指令。 Presumably putting this in a loop will give an efficient loop too... 大概把它放在一个循环中也会给出一个有效的循环......

Edit: For your copying task, I would do the following in your loop: 编辑:对于您的复制任务,我会在您的循环中执行以下操作:

  1. Read a 32-bit value from const uint32_t *src . const uint32_t *src读取32位值。
  2. Use the above code to swap it. 使用上面的代码交换它。
  3. Write a 32-bit value to uint32_t *dest . 将32位值写入uint32_t *dest

Strictly speaking this may not be portable (aliasing violations) but as long as the copy function is in its own translation unit and not getting inlined, there's very little to worry about. 严格地说,这可能不是可移植的(锯齿违规),但只要复制功能在其自己的翻译单元中并且没有内联,就没有什么可担心的。 Forget what I wrote about aliasing; 忘掉我写的关于别名的内容; if you're swapping the data as 32-bit values, it almost surely was actually 32-bit values to begin with, not some other type of pointer that was cast, so there's no issue. 如果你将数据交换为32位值,它几乎肯定是32位值开始,而不是其他类型的指针,所以没有问题。

In linux, you should check the header bits/byteswap.h . 在linux中,你应该检查头bits/byteswap.h there's a family of macros of the form bswap_##, and some of them use assembly instructions where appropriate. 有一系列形式为bswap _ ##的宏,其中一些在适当的地方使用汇编指令。

Yes there are existing functions like the one linked in the question but its not worth the effort because the size of the data (in this case) means the set up overhead is too high. 是的,现有的功能就像在问题中链接的功能但不值得付出努力,因为数据的大小(在这种情况下)意味着设置开销太高。 So instead, it's better to just read out 2, 4, and 8 bytes at a time and do the swap using intrinsics and write back. 因此,最好一次只读出2,4和8个字节,然后使用内在函数进行交换并回写。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM