Convert double to float goes wrong - C#

Question

I need to calculate bounds for rectangleF type

For some reason the casting from double to float is not evaluated precisely as it should be

This is an example of such calculate


float MinX = 0f, MaxX = 0f;
float MinY = 0f, MaxY = 0f;
float BoundsWidth = 0.2f;
float BoundsHeight = 0.1f;
double BoundsY = 2333638.6551984739;
double BoundsX = 895.0999755859375;

MinX = (float)BoundsX;
MinY = (float)BoundsY;

var MaxX_Defect = BoundsX + BoundsWidth;
var MaxY_Defect = BoundsY + BoundsHeight;
MaxX = (float)(MaxX_Defect);
MaxY = (float)(MaxY_Defect);

When I'm trying to calculate the hight MaxY-MinY its evaluated as 0 instead of 0.1f

How can I fix this?

Answer 1

The line float BoundsHeight = 0.1f; converts.1 to the nearest value representable in float , resulting in BoundsHeight being 0.100000001490116119384765625

The line double BoundsY = 2,333,638.6551984739; similarly converts to double , setting BoundsY to 2,333,638.6551984739489853382110595703125.

The line float MinY = BoundsY; converts that to float , setting MinY to 2,333,638.75.

The line double MaxY_Defect = BoundsY + BoundsHeight; computes using double (I presume; I am not familiar with C# semantics), setting MaxY_Defect to 2,333,638.7551984754391014575958251953125.

The line float MaxY = (float)(MaxY_Defect); converts that to float , setting MaxY to 2,333,638.75.

Then we can see that MinY and MaxY have the same value, so of course MaxY-MinY is zero.

Quite simply, float does not have enough precision to distinguish between 2,333,638.6551984739489853382110595703125 and 2,333,638.7551984754391014575958251953125. At the scale of 2,333,638, the distance between adjacent representable numbers in the float format is.25. This is because the format has 24 bits for the significand (the fraction portion of the floating-point representation). 2,333,638 is between 2 ²¹ and 2 ²² , so the exponent in its floating-point representation scales the significand to have bits representing values from 2 ²¹ to 2 ⁻² (from 21 to −2, inclusive, is 24 positions). So changing the significand by 1 in its lowest bit changes the represented number by 2 ⁻² =.25.

Thus, when 2,333,638.655… and 2,333,638.755… are converted to float , they have the same result, 2,333,638.75.

You cannot use float to distinguish between coordinates or sizes that are this close at that magnitude. You can use double or you might be able to translate the coordinates to be nearer the origin (so their magnitudes are smaller, putting them in a region where the float resolution is finer).

As long as the final result is small, you could do the intermediate calculations using double but still represent the final result well using float .

Convert double to float goes wrong - C#

Question

1 answers

solution1
0 2022-01-13 00:57:55

Convert double to float goes wrong - C#

Question

1 answers

solution1 0 2022-01-13 00:57:55

solution1
0 2022-01-13 00:57:55