Why does a printf() allow this double to be passed by pointer?

Question

A pair of printf() debugging statements reveals that a pointer to a double that I am passing is, when dereferenced at the receiving end, coming out as a different value — but only under Microsoft Visual Studio (version 9.0). The steps are quite simple:

    double rho=0;       /* distance from the Earth */
    /* ... */
    for (pass = 0; pass < 2; pass++) {
        /* ... */
        rho = sqrt(rsn*rsn+rp*rp-2*rsn*rp*cpsi*cos(ll));
        printf("\nrho from sqrt(): %f\n", rho);
        /* ... */
    }
    /* ... */
    cir_sky (np, lpd, psi, rp, &rho, lam, bet, lsn, rsn, op);
    /* ... */
}
/* ... */
static void
cir_sky (
/* ... */
double *rho,        /* dist from earth: in as geo, back as geo or topo */
/* ... */)
{
    /* ... */
    printf("\nDEBUG1: *rho=%f\n", *rho);

The entire C file is here:

https://github.com/brandon-rhodes/pyephem/blob/9cd81a8a7624b447429b6fd8fe9ee0d324991c3f/libastro-3.7.7/circum.c#L366

I would have expected that the value displayed in the first printf() would be the same as that displayed by the second, since passing a pointer to a double should not result in a different value. And under GCC they are, in fact, always the same value. Under Visual Studio 32-bit compilation they are always the same. But when this code is compiled with Visual Studio under a 64-bit architecture, the two double values are different!

https://ci.appveyor.com/project/brandon-rhodes/pyephem/build/1.0.18/job/4xu7abnl9vx3n770#L573

rho from sqrt(): 0.029624

DEBUG1: *rho=0.000171

This is disconcerting. I wondered: does the code between where rho is computed and where the pointer is finally passed somehow destroy the value by bad pointer arithmetic? So I added one last printf() , right above the cir_sky() call, to see if the value has already been altered by that point or whether it is altered in the course of the call itself:

    printf("\nrho about to be sent: %f\n", rho);
    cir_sky (np, lpd, psi, rp, &rho, lam, bet, lsn, rsn, op);

Here is that line in the context of the whole file:

https://github.com/brandon-rhodes/pyephem/blob/28ba4bee9ec84f58cfffabeda87cc01e972c86f6/libastro-3.7.7/circum.c#L382

And guess what?

Adding the printf() fixed the bug — the pointer passed to rho can now be dereferenced to the correct value!

As can be seen here:

https://ci.appveyor.com/project/brandon-rhodes/pyephem/build/1.0.19/job/s3nh90sk88cpn2ee#L567

rho from sqrt(): 0.029624

rho about to be sent: 0.029624

DEBUG1: *rho=0.029624

I am mystified.

What edge case of the C standard am I running into here? Why does merely using the value rho in the top-level scope of this function force the Microsoft compiler to correctly preserve its value? Is the problem that rho is both set and used inside of a block, and Visual Studio does not deign to preserve its value outside of that block because of a quirk of the C standard that I have never quite internalized?

You can see the entire build output at the AppVeyor link above. The particular compilation step for this C file, in case the problem might be how Visual Studio is invoked or the compile options, is:

C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\Bin\amd64\cl.exe /c /nologo /Ox /MD /W3 /GS- /DNDEBUG -Ilibastro-3.7.7 -IC:\Python27-x64\include -IC:\Python27-x64\PC /Tclibastro-3.7.7\circum.c /Fobuild\temp.win-amd64-2.7\Release\libastro-3.7.7\circum.obj
circum.c
libastro-3.7.7\circum.c(126) : warning C4244: '=' : conversion from 'double' to 'float', possible loss of data 
libastro-3.7.7\circum.c(127) : warning C4244: '=' : conversion from 'double' to 'float', possible loss of data
libastro-3.7.7\circum.c(139) : warning C4244: '=' : conversion from 'double' to 'float', possible loss of data 
libastro-3.7.7\circum.c(140) : warning C4244: '=' : conversion from 'double' to 'float', possible loss of data 
libastro-3.7.7\circum.c(295) : warning C4244: '=' : conversion from 'double' to 'float', possible loss of data 
libastro-3.7.7\circum.c(296) : warning C4244: '=' : conversion from 'double' to 'float', possible loss of data
libastro-3.7.7\circum.c(729) : warning C4244: '=' : conversion from 'double' to 'float', possible loss of data 
libastro-3.7.7\circum.c(730) : warning C4244: '=' : conversion from 'double' to 'float', possible loss of data

None of those warnings are, from what I can see, for code involved in this particular puzzle — and even if they were, all they would signify is that a float value might become less precise (from about 15 digits of decimal precision to 7), not that it could change completely.

Here, again, are the outputs of the two compilation-and-test runs, the first of which failed and the second of which — because of the printf() ? — succeeded:

https://ci.appveyor.com/project/brandon-rhodes/pyephem/build/1.0.18/job/4xu7abnl9vx3n770

https://ci.appveyor.com/project/brandon-rhodes/pyephem/build/1.0.19/job/s3nh90sk88cpn2ee

Both are for exactly the same architecture, according to AppVeyor:

Environment: PYTHON=C:\Python27-x64, PYTHON_VERSION=2.7.x, PYTHON_ARCH=64, WINDOWS_SDK_VERSION=v7.0

Answer 1

My quick look at this code didn't come up with anything that stands out as wrong or problematic. But, when a printf solves the problem, that means there is some non-determinism present. Let's analyze possible causes:

Concurrency - data races: most common, but you say it's single threaded.
Uninitialized memory: rho is initialized here, but, maybe something somewhere else isn't and it's messing things up. I'd run valgrind (on Linux) and AdressSanitizer and other sanitizers (should be available on clang and gcc for Windows, too) to see if they come up with something.
Wild pointers and other out-of-bounds access: nothing in the code we see here, but it's calling other functions. Again, run valgrind and sanitizers.
If previous steps come up short, the next most probable candidate is a MSVC bug. MSVC is notorious for messing up things in some complex code, and this is somewhat complex. Many times have I rearranged code just to make MSVC happy. Sometimes turning off optimization helps, sometimes it doesn't. Ditto for trying out different compiler options. Sometimes there is an update/patch that helps, sometimes there isn't. Ditto for the next version of MSVC. I'd suggest looking at disassembler in the debugger, but you say you don't have access to a Windows machine. The best bet here would be to try and simplify the code - make functions smaller, cut down number of arguments.
There are other possible causes. For example, maybe the stack got messed up for some reason - maybe when interacting with Python runtime. Try to build & run it as "regular" C code, rather than a Python extension. Eliminate calls to other functions (if it messes up with calculations, never mind, you're just trying to find out the problem).

In any case, I recommend you get your hands on a Windows machine and debug it. That's the best way to get to the bottom of such problems, in my experience.

Answer 2

Is this an effect of (faulty) optimization?

Turn off any optimization (DEBUG?) and see if you get the same effect.

Of course, if you find it is the optimizer then you are up the crick and can only do something to fool it eg a sprintf that does nothing.

Also, you printf's could also print out the pointer ("%16x", (long) &rho) not that I think it is incorrect but just as a sanity clause in case we are missing summat. Also, the result of most doubles with random bits usually ends up in the E+/-317 range so the 0.000171 result is a little too reasonable to be completely suspect.

Why does a printf() allow this double to be passed by pointer?

Question

2 answers

solution1
1 2016-02-08 19:38:29

solution2
0 2015-09-07 21:05:23

Why does a printf() allow this double to be passed by pointer?

Question

2 answers

solution1 1 2016-02-08 19:38:29

solution2 0 2015-09-07 21:05:23

solution1
1 2016-02-08 19:38:29

solution2
0 2015-09-07 21:05:23