精确地将float 32转换为unsigned short或unsigned char

Question

First of all sorry if this is a duplicate, I couldn't find any subject answering my question. 首先很抱歉，如果这是重复的，我找不到任何主题回答我的问题。

I'm coding a little program that will be used to convert 32-bit floating point values to short int (16 bits) and unsigned char (8 bits) values. 我正在编写一个小程序，该程序将用于将32位浮点值转换为short int（16位）和unsigned char（8位）值。 This is for HDR images purpose. 这是用于HDR图像的目的。

From here I could get the following function (without clamping): 从这里我可以得到以下功能（无钳位）：

static inline uint8_t u8fromfloat(float x)
{
    return (int)(x * 255.0f);
}

I suppose that in the same way we could get short int by multiplying by (pow( 2,16 ) -1) 我想以相同的方式，我们可以通过乘以(pow( 2,16 ) -1)来获得short int

But then I ended up thinking about ordered dithering and especially to Bayer dithering. 但是后来我最终想到了有序抖动，尤其是拜耳抖动。 To convert to uint8_t I suppose I could use a 4x4 matrix and a 8x8 matrix for unsigned short. 要转换为uint8_t，我想我可以使用4x4矩阵和8x8矩阵表示无符号short。

I also thought of a Look-up table to speed-up the process, this way: 我还想到了一个查找表，可以通过以下方式加快处理过程：

uint16_t LUT[0x10000] // 2¹⁶ values contained

and store 2^16 unsigned short values corresponding to a float. 并存储2 ^ 16个与浮点数对应的无符号短值。 This same table could be then used for uint8_t as well because of the implicit cast between unsigned short ↔ unsigned int 由于无符号short↔unsigned int之间的隐式转换，该表也可以用于uint8_t

But wouldn't a look-up table like this be huge in memory? 但是这样的查询表会不会占用很大的内存？ Also how would one fill a table like this?! 又怎么会这样填一张桌子呢？

Now I'm confused, what would be best according to you? 现在我很困惑，根据您的最佳选择是什么？

EDIT after uwind answer: Let's say now that I also want to do basic color space conversion at the same time, that is before converting to U8/U16 , do a color space conversion (in float), and then shrink it to U8/U16. 在uwind答案之后进行编辑：现在说，我也想同时进行基本的色彩空间转换，即在转换为U8 / U16之前，进行色彩空间转换（以float形式），然后将其缩小为U8 / U16 。 Wouldn't in that case use a LUT be more efficient? 在那种情况下，使用LUT会更有效吗？ And yeah I would still have the problem to index the LUT. 是的，我仍然有索引LUT的问题。

Answer 1

The way I see it, the look-up table won't help since in order to index into it, you need to convert the float into some integer type. 按照我的看法，查找表将无济于事，因为要对其进行索引，您需要将float转换为某种整数类型。 Catch 22. 赶上22。

The table would require 0x10000 * sizeof (uint16_t) bytes, which is 128 KB. 该表将需要0x10000 * sizeof（uint16_t）字节，即128 KB。 Not a lot by modern standards, but on the other hand cache is precious. 按照现代标准，这不是很多，但另一方面，缓存是很宝贵的。 But, as I said, the table doesn't add much to the solution since you need to convert float to integer in order to index. 但是，正如我所说，该表并没有为解决方案增加太多，因为您需要将float转换为integer才能建立索引。

You could do a table indexed by the raw bits of the float re-interpreted as integer, but that would have to be 32 bits which becomes very large (8 GB or so). 您可以创建一个表，该表由重新解释为整数的浮点数的原始位索引，但是该表必须是32位，这变得非常大（大约8 GB）。

Go for the straight-forward runtime conversion you outlined. 进行概述的简单运行时转换。

Answer 2

Just stay with the multiplication - it'll work fine. 只需保持乘法-它将正常工作。

Practically all modern CPU have vector instructions (SSE, AVX, ...) adapted to this stuff, so you might look at programming for that. 实际上，所有现代CPU都有适合于此功能的矢量指令（SSE，AVX等），因此您可能会考虑对其进行编程。 Or use a compiler that automatically vectorizes your code, if possible (Intel C, also GCC). 或者，如果可能的话，请使用可自动将您的代码矢量化的编译器（Intel C，也称为GCC）。 Even in cases where table-lookup is a possible solution, this can often be faster because you don't suffer from memory latency. 即使在查找表是可能的解决方案的情况下，这通常也可以更快，因为您不会遭受内存延迟的困扰。

Answer 3

First, it should be noted that float has 24 bits of precision, which can no way fit into a 16-bit int or even 8 bits. 首先，应该注意， float具有24位精度，这不可能适应16位int甚至8位精度。 Second, float have much larger range, which can't be stored in any int or long long int 其次，浮点数的范围要大得多，不能存储在任何int或long long int

So your question title is actually incorrect , no way to precisely convert any float to short or char. 因此，您的问题标题实际上是错误的 ，无法精确地将任何float转换为short或char。 You want to map a float value between 0 and 1 to an 8-bit or 16-bit int range. 您想要将介于0和1之间的浮点值 映射 到8位或16位int范围。

For the code you use above, it'll work fine. 对于您上面使用的代码，它将可以正常工作。 However the value 255 is extremely unlikely to be returned because it needs exactly 1.0 as input , otherwise values such as 254.99999 will ended up being truncated as 254. You should round the value instead 但是，返回值255的可能性极小，因为它恰好需要1.0作为输入 ，否则将像254.99999这样的值最终被截断为254。您应该取整该值

return (int)(x * 255.0f + .5f);

or better, use the code provided in your link for more balanced distribution 或更好，请使用链接中提供的代码以实现更均衡的分配

static inline uint8_t u8fromfloat_trick(float x)
{
    union { float f; uint32_t i; } u;
    u.f = 32768.0f + x * (255.0f / 256.0f);
    return (uint8_t)u.i;
}

Using LUT wouldn't be any faster because a table for 16-bit values is too large for fitting in cache , and in fact may reduce your performance greatly. 使用LUT不会更快，因为用于16位值的表太大了，无法放入缓存中 ，并且实际上可能会大大降低性能。 The snippet above needs only 2 floating-point instructions, or only 1 with FMA . 上面的代码片段仅需要2个浮点指令，或者仅需要1个FMA指令。 And SIMD will improve performance 4-32x (or more) further, so LUT method would be easily outperformed as it's much harder to parallelize table look ups SIMD将进一步提高性能4-32倍（或更多），因此LUT方法将很容易实现，因为它很难并行化表查找

精确地将float 32转换为unsigned short或unsigned char

问题描述

3 个解决方案

解决方案1
1 已采纳 2013-01-08 11:12:38

解决方案2
0 2013-01-08 11:16:53

解决方案3
-1 2013-10-09 01:08:02

精确地将float 32转换为unsigned short或unsigned char

问题描述

3 个解决方案

解决方案1 1 已采纳 2013-01-08 11:12:38

解决方案2 0 2013-01-08 11:16:53

解决方案3 -1 2013-10-09 01:08:02

解决方案1
1 已采纳 2013-01-08 11:12:38

解决方案2
0 2013-01-08 11:16:53

解决方案3
-1 2013-10-09 01:08:02