简体   繁体   English

测试 final 字段的初始化安全性

[英]Testing initialization safety of final fields

I am trying to simply test out the initialization safety of final fields as guaranteed by the JLS.我试图简单地测试 JLS 保证的最终字段的初始化安全性。 It is for a paper I'm writing.这是我正在写的一篇论文。 However, I am unable to get it to 'fail' based on my current code.但是,根据我当前的代码,我无法让它“失败”。 Can someone tell me what I'm doing wrong, or if this is just something I have to run over and over again and then see a failure with some unlucky timing?有人能告诉我我做错了什么,或者如果这只是我必须一遍又一遍地运行然后看到一些不幸时机的失败?

Here is my code:这是我的代码:

public class TestClass {

    final int x;
    int y;
    static TestClass f;

    public TestClass() {
        x = 3;
        y = 4;
    }

    static void writer() {
        TestClass.f = new TestClass();
    }

    static void reader() {
        if (TestClass.f != null) {
            int i = TestClass.f.x; // guaranteed to see 3
            int j = TestClass.f.y; // could see 0

            System.out.println("i = " + i);
            System.out.println("j = " + j);
        }
    }
}

and my threads are calling it like this:我的线程是这样调用它的:

public class TestClient {

    public static void main(String[] args) {

        for (int i = 0; i < 10000; i++) {
            Thread writer = new Thread(new Runnable() {
                @Override
                public void run() {
                    TestClass.writer();
                }
            });

            writer.start();
        }

        for (int i = 0; i < 10000; i++) {
            Thread reader = new Thread(new Runnable() {
                @Override
                public void run() {
                    TestClass.reader();
                }
            });

            reader.start();
        }
    }
}

I have run this scenario many, many times.我已经多次运行这个场景。 My current loops are spawning 10,000 threads, but I've done with this 1000, 100000, and even a million.我当前的循环产生了 10,000 个线程,但我已经完成了 1000、100000 甚至 100 万个线程。 Still no failure.仍然没有失败。 I always see 3 and 4 for both values.对于这两个值,我总是看到 3 和 4。 How can I get this to fail?我怎样才能让它失败?

I wrote the spec.我写了规范。 The TL; TL; DR version of this answer is that just because it may see 0 for y, that doesn't mean it is guaranteed to see 0 for y.这个答案的 DR 版本是,仅仅因为它可能会看到 0 的 y,这并不意味着它一定看到 0 的 y。

In this case, the final field spec guarantees that you will see 3 for x, as you point out.在这种情况下,正如您指出的那样,最终的字段规范保证您将看到 x 为 3。 Think of the writer thread as having 4 instructions:将编写器线程视为具有 4 条指令:

r1 = <create a new TestClass instance>
r1.x = 3;
r1.y = 4;
f = r1;

The reason you might not see 3 for x is if the compiler reordered this code:您可能看不到 x 为 3 的原因是编译器重新排序了此代码:

r1 = <create a new TestClass instance>
f = r1;
r1.x = 3;
r1.y = 4;

The way the guarantee for final fields is usually implemented in practice is to ensure that the constructor finishes before any subsequent program actions take place.在实践中通常实现对 final 字段的保证的方式是确保构造函数在任何后续程序操作发生之前完成。 Imagine someone erected a big barrier between r1.y = 4 and f = r1.想象一下,有人在 r1.y = 4 和 f = r1 之间设置了一个很大的障碍。 So, in practice, if you have any final fields for an object, you are likely to get visibility for all of them.因此,在实践中,如果您有一个对象的任何最终字段,您可能会看到所有这些字段。

Now, in theory, someone could write a compiler that isn't implemented that way.现在,理论上,有人可以编写未以这种方式实现的编译器。 In fact, many people have often talked about testing code by writing the most malicious compiler possible.事实上,很多人经常谈到通过编写尽可能恶意的编译器来测试代码。 This is particularly common among the C++ people, who have lots and lots of undefined corners of their language that can lead to terrible bugs.这在 C++ 人员中尤其常见,他们的语言有很多未定义的角落,可能会导致可怕的错误。

From Java 5.0, you are guarenteed that all threads will see the final state set by the constructor.从 Java 5.0 开始,您可以保证所有线程都会看到构造函数设置的最终状态。

If you want to see this fail, you could try an older JVM like 1.3.如果你想看到这个失败,你可以尝试像 1.3 这样的旧 JVM。

I wouldn't print out every test, I would only print out the failures.我不会打印出每个测试,我只会打印出失败。 You could get one failure in a million but miss it.你可能会在百万分之一的失败但错过它。 But if you only print failures, they should be easy to spot.但是如果你只打印失败,它们应该很容易被发现。

A simpler way to see this fail is to add to the writer.查看此失败的更简单方法是添加到编写器。

f.y = 5;

and test for并测试

int y = TestClass.f.y; // could see 0, 4 or 5
if (y != 5)
    System.out.println("y = " + y);

I'd like to see a test which fails or an explanation why it's not possible with current JVMs.我希望看到一个失败的测试或解释为什么当前的 JVM 不可能。

Multithreading and Testing多线程和测试

You can't prove that a multithreaded application is broken (or not) by testing for several reasons:由于以下几个原因,您无法通过测试来证明多线程应用程序是否已损坏(或未损坏):

  • the problem might only appear once every x hours of running, x being so high that it is unlikely that you see it in a short test该问题可能每运行 x 小时只出现一次,x 太高以至于您不太可能在短期测试中看到它
  • the problem might only appear with some combinations of JVM / processor architectures该问题可能只出现在 JVM/处理器架构的某些组合中

In your case, to make the test break (ie to observe y == 0) would require the program to see a partially constructed object where some fields have been properly constructed and some not.在您的情况下,要使测试中断(即观察 y == 0)将需要程序查看部分构造的对象,其中某些字段已正确构造,而有些则没有。 This typically does not happen on x86 / hotspot.这通常不会发生在 x86 / 热点上。

How to determine if a multithreaded code is broken?如何确定多线程代码是否损坏?

The only way to prove that the code is valid or broken is to apply the JLS rules to it and see what the outcome is.证明代码有效或损坏的唯一方法是对其应用 JLS 规则并查看结果。 With data race publishing (no synchronization around the publication of the object or of y), the JLS provides no guarantee that y will be seen as 4 (it could be seen with its default value of 0).对于数据竞争发布(没有围绕对象或 y 的发布进行同步),JLS 不保证 y 将被视为 4(它的默认值可以是 0)。

Can that code really break?那个代码真的可以破解吗?

In practice, some JVMs will be better at making the test fail.在实践中,一些 JVM 会更好地使测试失败。 For example some compilers (cf "A test case showing that it doesn't work" in this article ) could transform TestClass.f = new TestClass();例如,一些编译器(CF“测试用例表明它不工作”在本文中)可以将TestClass.f = new TestClass(); into something like (because it is published via a data race):变成类似的东西(因为它是通过数据竞赛发布的):

(1) allocate memory
(2) write fields default values (x = 0; y = 0) //always first
(3) write final fields final values (x = 3)    //must happen before publication
(4) publish object                             //TestClass.f = new TestClass();
(5) write non final fields (y = 4)             //has been reodered after (4)

The JLS mandates that (2) and (3) happen before the object publication (4). JLS 要求 (2) 和 (3) 发生在对象发布 (4) 之前。 However, due to the data race, no guarantee is given for (5) - it would actually be a legal execution if a thread never observed that write operation.然而,由于数据竞争,不能保证 (5) - 如果线程从未观察到写入操作,它实际上是合法的执行。 With the proper thread interleaving, it is therefore conceivable that if reader runs between 4 and 5, you will get the desired output.通过适当的线程交错,因此可以想象,如果reader在 4 到 5 之间运行,您将获得所需的输出。

I don't have a symantec JIT at hand so can't prove it experimentally :-)我手头没有赛门铁克 JIT,因此无法通过实验证明:-)

Here is an example of default values of non final values being observed despite that the constructor sets them and doesn't leak this . 是一个观察非最终值的默认值的示例,尽管构造函数设置了它们并且不会泄漏this This is based off my other question which is a bit more complicated.这是基于我的另一个问题,它有点复杂。 I keep seeing people say it can't happen on x86, but my example happens on x64 linux openjdk 6...我一直看到人们说它不能在 x86 上发生,但我的例子发生在 x64 linux openjdk 6 上......

This is a good question with a complicated answer.这是一个很好的问题,答案很复杂。 I've split it in pieces for an easier read.我把它分成几部分以便于阅读。

People have said here enough times that under the strict rules of JLS - you should be able to see the desired behavior.人们在这里已经说过很多次了,在JLS严格规则下 - 您应该能够看到所需的行为。 But compilers (I mean C1 and C2 ), while they have to respect the JLS , they can make optimizations.但是编译器(我的意思是C1C2 ),虽然他们必须尊重JLS ,但他们可以进行优化。 And I will get to this later.我稍后会谈到这个。

Let's take the first, easy scenario, where there are two non-final variables and see if we can publish an in-correct object.让我们采用第一个简单的场景,其中有两个non-final变量,看看我们是否可以发布不正确的对象。 For this test, I am using a specialized tool that was tailored for this kind of tests exactly.对于此测试,我使用了专门为此类测试量身定制的工具 Here is a test using it:这是一个使用它的测试:

@Outcome(id = "0, 2", expect = Expect.ACCEPTABLE_INTERESTING, desc = "not correctly published")
@Outcome(id = "1, 0", expect = Expect.ACCEPTABLE_INTERESTING, desc = "not correctly published")
@Outcome(id = "1, 2", expect = Expect.ACCEPTABLE, desc = "published OK")
@Outcome(id = "0, 0", expect = Expect.ACCEPTABLE, desc = "II_Result default values for int, not interesting")
@Outcome(id = "-1, -1", expect = Expect.ACCEPTABLE, desc = "actor2 acted before actor1, this is OK")
@State
@JCStressTest
public class FinalTest {

    int x = 1;
    Holder h;

    @Actor
    public void actor1() {
        h = new Holder(x, x + 1);
    }

    @Actor
    public void actor2(II_Result result) {
        Holder local = h;
        // the other actor did it's job
        if (local != null) {
            // if correctly published, we can only see {1, 2} 
            result.r1 = local.left;
            result.r2 = local.right;
        } else {
            // this is the case to "ignore" default values that are
            // stored in II_Result object
            result.r1 = -1;
            result.r2 = -1;
        }
    }

    public static class Holder {

        // non-final
        int left, right;

        public Holder(int left, int right) {
            this.left = left;
            this.right = right;
        }
    }
}

You do not have to understand the code too much;您不必太了解代码; though the very minimal explanations is this: there are two Actor s that mutate some shared data and those results are registered.尽管最简单的解释是:有两个Actor会改变一些共享数据,并且这些结果已被注册。 @Outcome annotations control those registered results and set certain expectations (under the hood things are far more interesting and verbose). @Outcome注释控制那些注册的结果并设置某些期望(在@Outcome事情要有趣得多,而且冗长得多)。 Just bare in mind, this is a very sharp and specialized tool;请记住,这是一个非常敏锐和专业的工具; you can't really do the same thing with two threads running.你不能用两个线程运行来做同样的事情。

Now, if I run this, the result in these two:现在,如果我运行这个,结果是这两个:

 @Outcome(id = "0, 2", expect = Expect.ACCEPTABLE_INTERESTING....)
 @Outcome(id = "1, 0", expect = Expect.ACCEPTABLE_INTERESTING....)

will be observed (meaning there was an unsafe publication of the Object, that the other Actor/Thread has actually see).将被观察到(意味着有一个不安全的对象发布,另一个 Actor/线程实际上已经看到了)。

Specifically these are observed in the so-called TC2 suite of tests, and these are actually run like this:具体来说,这些是在所谓的TC2测试套件中观察到的,它们实际上是这样运行的:

java... -XX:-TieredCompilation 
        -XX:+UnlockDiagnosticVMOptions 
        -XX:+StressLCM 
        -XX:+StressGCM

I will not dive too much of what these do, but here is what StressLCM and StressGCM does and, of course, what TieredCompilation flag does.我不会过多介绍它们的作用,但这里是 StressLCM 和 StressGCM 的作用,当然,还有TieredCompilation标志的作用。

The entire point of the test is that:整个测试的要点是:

This code proves that two non-final variables set in the constructor are incorrectly published and that is run on x86 .此代码证明构造函数中设置的两个非最终变量未正确发布并且在x86上运行。


The sane thing to do now, since there is a specialized tool in place, change a single field to final and see it break.现在做的明智之举,因为有一个专门的工具到位,将单个字段更改为final并看到它被破坏。 As such, change this and run again, we should observe the failure:因此,更改它并再次运行,我们应该观察失败:

public static class Holder {

    // this is the change
    final int right;
    int left;

    public Holder(int left, int right) {
        this.left = left;
        this.right = right;
    }
}

But if we run it again, the failure is not going to be there.但是如果我们再次运行它,故障就不会存在了。 ie none of the two @Outcome that we have talked above are going to be part of the output.即我们上面讨论的两个@Outcome都不会成为输出的一部分。 How come?怎么来的?

It turns out that when you write even to a single final variable , the JVM (specifically C1 ) will do the correct thing , all the time.事实证明, 当您甚至写入单个最终变量时JVM (特别是C1 )将始终执行正确的操作。 Even for a single field , as such this is impossible to demonstrate.即使对于单个字段,也无法证明这一点。 At least at the moment.至少目前是这样。


In theory you could throw Shenandoah into this and it's interesting flag : ShenandoahOptimizeInstanceFinals (not going to dive into it).从理论上讲,您可以将Shenandoah投入其中,这是一个有趣的标志: ShenandoahOptimizeInstanceFinals (不打算深入研究)。 I have tried running previous example with:我曾尝试使用以下示例运行先前的示例:

 -XX:+UnlockExperimentalVMOptions  
 -XX:+UseShenandoahGC  
 -XX:+ShenandoahOptimizeInstanceFinals  
 -XX:-TieredCompilation  
 -XX:+UnlockDiagnosticVMOptions  
 -XX:+StressLCM  
 -XX:+StressGCM 

but this does not work as I hoped it will.但这并不像我希望的那样工作。 What is far worse for my arguments of even trying this, is that these flags are going to be removed in jdk-14 .对于我什至尝试这样做的论点而言,更糟糕的是,这些标志将在 jdk-14 中删除

Bottom-line: At the moment there is no way to break this.底线:目前没有办法打破这一点。

What about you modified the constructor to do this:你修改构造函数来做到这一点怎么样:

public TestClass() {
 Thread.sleep(300);
   x = 3;
   y = 4;
}

I am not an expert on JLF finals and initializers, but common sense tells me this should delay setting x long enough for writers to register another value?我不是 JLF 决赛和初始值设定项的专家,但常识告诉我这应该延迟设置 x 足够长的时间,以便编写者注册另一个值?

What if one changes the scenario into如果将场景更改为

public class TestClass {

    final int x;
    static TestClass f;

    public TestClass() {
        x = 3;
    }

    int y = 4;

    // etc...

}

? ?

Better understanding of why this test does not fail can come from understanding of what actually happens when constructor is invoked.通过了解调用构造函数时实际发生的情况,可以更好地理解为什么此测试不会失败。 Java is a stack-based language. Java 是一种基于堆栈的语言。 TestClass.f = new TestClass(); consists of four action.由四个动作组成。 First new instruction is called, its like malloc in C/C++, it allocates memory and places a reference to it on the top of the stack.调用第一条new指令,就像 C/C++ 中的 malloc 一样,它分配内存并将对它的引用放在堆栈顶部。 Then reference is duplicated for invoking a constructor.然后复制引用以调用构造函数。 Constructor in fact is like any other instance method, its invoked with the duplicated reference.构造函数实际上就像任何其他实例方法一样,它是用重复的引用调用的。 Only after that reference is stored in the method frame or in the instance field and becomes accessible from anywhere else.只有在该引用存储在方法框架或实例字段中并且可以从其他任何地方访问之后。 Before the last step reference to the object is present only on the top of creating thread's stack and no body else can see it.在最后一步之前,对对象的引用仅出现在创建线程堆栈的顶部,其他主体无法看到它。 In fact there is no difference what kind of field you are working with, both will be initialized if TestClass.f != null .实际上,您使用的字段类型没有区别,如果TestClass.f != null ,两者都将被初始化。 You can read x and y fields from different objects, but this will not result in y = 0 .您可以从不同的对象读取 x 和 y 字段,但这不会导致y = 0 For more information you should see JVM Specification and Stack-oriented programming language articles.有关更多信息,您应该查看JVM 规范面向堆栈的编程语言文章。

UPD : One important thing I forgot to mention. UPD :我忘了提及一件重要的事情。 By java memory there is no way to see partially initialized object.通过 java 内存无法看到部分初始化的对象。 If you do not do self publications inside constructor, sure.如果您不在构造函数中进行自我发布,请确保。

JLS : JLS :

An object is considered to be completely initialized when its constructor finishes.当一个对象的构造函数完成时,它被认为是完全初始化的。 A thread that can only see a reference to an object after that object has been completely initialized is guaranteed to see the correctly initialized values for that object's final fields.只有在对象完全初始化后才能看到对对象的引用的线程可以保证看到该对象的最终字段的正确初始化值。

JLS : JLS :

There is a happens-before edge from the end of a constructor of an object to the start of a finalizer for that object.从对象的构造函数的末尾到该对象的终结器的开始,有一个happens-before 边缘。

Broader explanation of this point of view : 对此观点的更广泛解释

It turns out that the end of an object's constructor happens-before the execution of its finalize method.事实证明,对象构造函数的结束发生在执行其 finalize 方法之前。 In practice, what this means is that any writes that occur in the constructor must be finished and visible to any reads of the same variable in the finalizer, just as if those variables were volatile.实际上,这意味着在构造函数中发生的任何写入都必须完成并且对终结器中相同变量的任何读取可见,就像这些变量是 volatile 一样。

UPD : That was the theory, let's turn to practice. UPD :这就是理论,让我们转向实践。

Consider the following code, with simple non-final variables:考虑以下带有简单非最终变量的代码:

public class Test {

    int myVariable1;
    int myVariable2;

    Test() {
        myVariable1 = 32;
        myVariable2 = 64;
    }

    public static void main(String args[]) throws Exception {
        Test t = new Test();
        System.out.println(t.myVariable1 + t.myVariable2);
    }
}

The following command displays machine instructions generated by java, how to use it you can find in awiki :以下命令显示由 java 生成的机器指令,您可以在wiki 中找到如何使用它:

java.exe -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly -Xcomp -XX:PrintAssemblyOptions=hsdis-print-bytes -XX:CompileCommand=print,*Test.main Test java.exe -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly -Xcomp -XX:PrintAssemblyOptions=hsdis-print-bytes -XX:CompileCommand=print,*Test.main 测试

It's output:它的输出:

...
0x0263885d: movl   $0x20,0x8(%eax)    ;...c7400820 000000
                                    ;*putfield myVariable1
                                    ; - Test::<init>@7 (line 12)
                                    ; - Test::main@4 (line 17)
0x02638864: movl   $0x40,0xc(%eax)    ;...c7400c40 000000
                                    ;*putfield myVariable2
                                    ; - Test::<init>@13 (line 13)
                                    ; - Test::main@4 (line 17)
0x0263886b: nopl   0x0(%eax,%eax,1)   ;...0f1f4400 00
...

Field assignments are followed by NOPL instruction, one of it's purposes is to prevent instruction reordering .字段分配后跟NOPL指令,其目的之一是防止指令重新排序

Why does this happen?为什么会发生这种情况? According to specification finalization happens after constructor returns.根据规范,在构造函数返回后完成。 So GC thread cant see a partially initialized object.所以GC线程看不到部分初始化的对象。 On a CPU level GC thread is not distinguished from any other thread.在 CPU 级别 GC 线程与任何其他线程没有区别。 If such guaranties are provided to GC, than they are provided to any other thread.如果将此类保证提供给 GC,那么它们将提供给任何其他线程。 This is the most obvious solution to such restriction.这是对这种限制的最明显的解决方案。

Results:结果:

1) Constructor is not synchronized, synchronization is done by other instructions . 1)构造函数不同步,同步由 其他指令完成。

2) Assignment to object's reference cant happen before constructor returns. 2) 不能在构造函数返回之前分配给对象的引用。

What's going on in this thread?这个线程中发生了什么? Why should that code fail in the first place?为什么该代码首先会失败?

You launch 1000s of threads that will each do the following:您启动了 1000 个线程,每个线程将执行以下操作:

TestClass.f = new TestClass();

What that does, in order:这样做的顺序是:

  1. evaluate TestClass.f to find out its memory location评估TestClass.f以找出其内存位置
  2. evaluate new TestClass() : this creates a new instance of TestClass, whose constructor will initialize both x and y评估new TestClass() :这将创建一个新的 TestClass 实例,其构造函数将同时初始化xy
  3. assign the right-hand value to the left-hand memory location将右侧的值分配给左侧的内存位置

An assignment is an atomic operation which is always performed after the right-hand value has been generated .赋值是一种原子操作,总是在生成右侧值之后执行 Here is a citation from the Java language spec (see the first bulleted point) but it really applies to any sane language. 这里引用了 Java 语言规范(请参阅第一个要点),但它确实适用于任何理智的语言。

This means that while the TestClass() constructor is taking its time to do its job, and x and y could conceivably still be zero, the reference to the partially initialized TestClass object only lives in that thread's stack, or CPU registers, and has not been written to TestClass.f这意味着虽然TestClass()构造函数正在花时间完成它的工作,并且可以想象xy仍然为零,但对部分初始化的TestClass对象的引用只存在于该线程的堆栈或 CPU 寄存器中,并且没有已写入TestClass.f

Therefore TestClass.f will always contain:因此TestClass.f将始终包含:

  • either null , at the start of your program, before anything else is assigned to it,要么null ,在你的程序开始时,在其他任何东西被分配给它之前,
  • or a fully initialized TestClass instance.或完全初始化的TestClass实例。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM