简体   繁体   中英

How might a local float variable become corrupted in Java (Android 8.0.0)?

UPDATED with significant new info (see bottom)

EDITED with better logging code

I'm tracking down some graphical corruption in our app, and have traced it to this function (which I have liberally sprinkled with logging):

public final int p_PostDraw(){
    bb_std_lang.print("Dissolve encountered.");
    float t_d=c_GColour.m_dissolve;
    bb_std_lang.print("Cached dissolve value locally: "+c_GColour.m_dissolve+ " " + t_d);
    c_GColour.m_dissolve=c_Gel.m_colstack.p_Top().m_a;
    bb_std_lang.print("Updated dissolve value: " + c_GColour.m_dissolve);
    c_Gel.m_colstack.p_Top().m_a=1.0f;
    bb_std_lang.print("Monitoring t_d="+t_d);
    super.p_PostDraw();
    bb_std_lang.print("Monitoring t_d="+t_d);
    c_GColour.m_dissolve=t_d;
    bb_std_lang.print("Dissolve post restore " + c_GColour.m_dissolve);
    return 0;
}

Most of the time this works as expected, but at a certain point in the game, this is logged:

09-30 14:40:59.086 10545-11101/? I/[Monkey]: Dissolve encountered.
09-30 14:40:59.086 10545-11101/? I/[Monkey]: Cached dissolve value locally: 1.0 1.0
09-30 14:40:59.086 10545-11101/? I/[Monkey]: Updated dissolve value: 0.1
09-30 14:40:59.086 10545-11101/? I/[Monkey]: Monitoring t_d=1.0
09-30 14:40:59.087 10545-11101/? I/[Monkey]: Monitoring t_d=-1.6314132E-19
09-30 14:40:59.087 10545-11101/? I/[Monkey]: Dissolve post restore -1.6314132E-19

To break that down for you: a static value m_dissolve is copied into a local variable t_d , and has the value 1.0f. The static value is then modified to 0.1f and used in the rendering of child objects. When this is complete, the local variable t_d is logged again prior to being used to reinstate the static value, but has mysteriously become -1.6314132E-19 in the meantime (this value seems unpredictably different each time).

I'm not aware of any way a local Java variable could be corrupted in this way.

UPDATE:

As a test I made t_d a member variable of the class containing the code listing above (rather than a local variable), and its value was no longer corrupted.

As a further test I then added some local float variables (m_e to m_j) that were used for nothing except printing out before and after super.p_PostDraw(). This is what happened:

09-30 15:20:51.219 28384-28877/? I/[Monkey]: M_E ETC BEFORE: 1.0 1.0 1.0 1.0 1.0 1.0
09-30 15:20:51.220 28384-28877/? I/[Monkey]: M_E ETC AFTER: 6.7E-44 6.7E-44 6.7E-44 6.7E-44 6.7E-44 6.7E-44

I then removed all filters from the logcat and found this lurking between those two lines:

09-30 15:20:51.219 28384-28877/? I/[Monkey]: M_E ETC BEFORE: 1.0 1.0 1.0 1.0 1.0 1.0
09-30 15:20:51.220 28384-28877/? I/zygote64: Deoptimizing int app.hidden.name.c_IffLT.p_Update4(app.hidden.name.c_Gel) due to JIT inline cache
09-30 15:20:51.220 28384-28877/? I/zygote64: Deoptimizing int app.hidden.name.c_Delay.p_Pump2(app.hidden.name.c_Gel) due to JIT inline cache
09-30 15:20:51.220 28384-28877/? I/zygote64: Deoptimizing int app.hidden.name.c_SetUniform_4F.p_Update4(app.hidden.name.c_Gel) due to JIT inline cache
09-30 15:20:51.220 28384-28877/? I/zygote64: Deoptimizing int app.hidden.name.c_EX_VarString.p_Set7(app.hidden.name.c_Expression) due to JIT inline cache
09-30 15:20:51.220 28384-28877/? I/zygote64: Deoptimizing java.lang.String app.hidden.name.c_EX_Format.p_AsString() due to JIT inline cache
09-30 15:20:51.220 28384-28877/? I/zygote64: Deoptimizing int app.hidden.name.c_IffNotEqual.p_Update4(app.hidden.name.c_Gel) due to JIT inline cache
09-30 15:20:51.220 28384-28877/? I/zygote64: Deoptimizing int app.hidden.name.c_PinVP.p_Update4(app.hidden.name.c_Gel) due to JIT inline cache
09-30 15:20:51.220 28384-28877/? I/[Monkey]: M_E ETC AFTER: 6.7E-44 6.7E-44 6.7E-44 6.7E-44 6.7E-44 6.7E-44

UPDATE:

I tweaked the code above to set the local variables to different values rather than all being 1.0f, and this happened:

09-30 15:56:37.686 1815-2373/? I/[Monkey]: M_E BEFORE: 2.0 1.0 3.0 4.0 5.0 1.0
09-30 15:56:37.687 1815-2373/? I/zygote64: Deoptimizing int app.hidden.name.c_IffLT.p_Update4(app.hidden.name.c_Gel) due to JIT inline cache
09-30 15:56:37.687 1815-2373/? I/zygote64: Deoptimizing int app.hidden.name.c_Delay.p_Pump2(app.hidden.name.c_Gel) due to JIT inline cache
09-30 15:56:37.687 1815-2373/? I/zygote64: Deoptimizing int app.hidden.name.c_SetUniform_4F.p_Update4(app.hidden.name.c_Gel) due to JIT inline cache
09-30 15:56:37.687 1815-2373/? I/zygote64: Deoptimizing int app.hidden.name.c_EX_VarString.p_Set7(app.hidden.name.c_Expression) due to JIT inline cache
09-30 15:56:37.688 1815-2373/? I/zygote64: Deoptimizing java.lang.String app.hidden.name.c_EX_Format.p_AsString() due to JIT inline cache
09-30 15:56:37.688 1815-2373/? I/zygote64: Deoptimizing int app.hidden.name.c_IffNotEqual.p_Update4(app.hidden.name.c_Gel) due to JIT inline cache
09-30 15:56:37.688 1815-2373/? I/zygote64: Deoptimizing int app.hidden.name.c_PinVP.p_Update4(app.hidden.name.c_Gel) due to JIT inline cache
09-30 15:56:37.688 1815-2373/? I/[Monkey]: M_E AFTER: 2.0 -5.6063644E-30 3.0 4.0 5.0 -5.6063644E-30

In other words, only the local variable set to 1.0f was corrupted. Not the first one declared or anything, JUST the one set to 1.0f. I then tried setting all the local variables to different numbers OTHER than 1.0f and no corruption occurred .

Surely there must be some kind of prize for the most obscure bug ever? That a local variable will get corrupted if the ART deoptimises functions, but only if that local variable has the value 1.0?

I'm not aware of any way a local Java variable could be corrupted in this way.

It could happen if some native code somewhere is trampling the stack frame that contains the t_d variable.

It could also happen if there is a race condition or memory hazard in this section:

   float t_d=c_GColour.m_dissolve;
   bb_std_lang.print("Dissolve pre stack: " + c_GColour.m_dissolve);

If you look carefully, you are not actually printing t_d . What you are doing is printing the (apparent) value of c_GColour.m_dissolve ... after it has been assigned to t_d . It might have changed.

(Note that you are accessing a bare variable m_dissolve , apparently without any synchronization. There is a potential race condition here even if m_disolve is declared as volatile .)

This is a really weird issue and seems that it is bug in ART/JIT. I can see at least 3 different developer stories with same problem. I think there could be a lot more of them but this bug is really difficult to reproduce.

I faced this issue during development of the game using LibGDX framework. UI in this framework is created with a lot of complex calculations and floats are intensively used there. In my case UI components received wrong coordinates, so layout was completely broken.

Weird thing that you can't reproduce this issue with DEBUG apk, only with RELEASE apk. Changing value of manifest android:debuggable=true also will not work. So debugging is really painfull, you need to monitor logcat and verify values of float variables.

There is an issue which was created by @Peeling, so if you have related problem please ask android devs to fix it: https://issuetracker.google.com/issues/141825192

Workaround

I forced compiler to perform "Deoptimization" of the methods related to UI before I actually need them. I created mock "Table" component, filled it with other mock components (creating layout similar to real one) and called the methods which I was able to see in the logs (when Deoptimization was happening). I'm making this operation every time on application start and issue seems to be "fixed" for me - "Deoptimization" never happens after this point and layout is always correct after this.

I hope it will help someone who also spent a lot of time on this issue.

I've also had this problem in my released game with LibGDX, just like tscissors.

After many months debugging, now I am sure that the problems I've had were related to destroyed float-variables, like described in this post.

For me an easy workaround helped: In the AndroidManifest.xml setting the attribute android:vmSafeMode to true eliminated the bug successfully, since it disables the JIT optimizations.

See android description:

android:vmSafeMode

Indicates whether the app would like the virtual machine (VM) to operate in safe mode. The default value is "false". This attribute was added in API level 8 where a value of "true" disabled the Dalvik just-in-time (JIT) compiler.

This attribute was adapted in API level 22 where a value of "true" disabled the ART ahead-of-time (AOT) compiler.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM