Java 本地 vs 实例变量访问速度

Question

So my question is about variable accessing speed in Java.所以我的问题是关于 Java 中的可变访问速度。 Today in my "CS" (if you can call it that) the teacher presented a similar example to the following of a List:今天在我的“CS”（如果你可以这么称呼的话）中，老师展示了一个类似于以下列表的示例：

public class ListExample<T> {
    private Node<T> head;
    private Node<T> tail;

    private class Node<T> { /* ... */ }

    public void append(T content) {
        if (!isEmpty()) {
            Node<T> dummy = new Node<T>(content);
            head = dummy;
            tail = dummy;

            head.setNext(head);
            // or this
            dummy.setNext(dummy);

        } else { /* ... */ }
    }

    // more methods
    // ...
}

My question is: Would the call to head.setNext(head) be slower than dummy.setNext(dummy) ?我的问题是：调用head.setNext(head)会比dummy.setNext(dummy)慢吗？ Even if it's not noticeable.即使它并不引人注目。 I was wondering this since head is obviously and instance var of the class and dummy is local, so would the local access be faster?我想知道这一点，因为head显然是并且类的实例 var 和 dummy 是本地的，那么本地访问会更快吗？

Answer 1

Ok, I've written a micro-benchmark (as suggested by @Joni & @MattBall) and here are the results for 1 x 1000000000 accesses for each a local and an instance variable:好的，我已经编写了一个微基准测试（如@Joni 和 @MattBall 所建议的那样），这里是每个本地变量和一个实例变量的 1 x 1000000000 次访问的结果：

Average time for instance variable access: 5.08E-4
Average time for local variable access: 4.96E-4

For 10 x 1000000000 accesses each:对于每个 10 x 1000000000 次访问：

Average time for instance variable access:4.723E-4
Average time for local variable access:4.631E-4

For 100 x 1000000000 accesses each:对于每个 100 x 1000000000 次访问：

Average time for instance variable access: 5.050300000000002E-4
Average time for local variable access: 5.002400000000001E-4

So it seems that local variable accesses are indeed faster that instance var accesses (even if both point to the same object).所以看起来局部变量访问确实比实例变量访问更快（即使两者都指向同一个对象）。

Note: I didn't want to find this out, because of something I wanted to optimize, it was just pure interest.注意：我不想发现这个，因为我想优化一些东西，这只是纯粹的兴趣。

PS Here is the code for the micro-benchmark: PS 这是微基准测试的代码：

public class AccessBenchmark {
    private final long N = 1000000000;
    private static final int M = 1;

    private LocalClass instanceVar;

    private class LocalClass {
        public void someFunc() {}
    }

    public double testInstanceVar() {
        // System.out.println("Running instance variable benchmark:");
        instanceVar = new LocalClass();

        long start = System.currentTimeMillis();
        for (int i = 0; i < N; i++) {
            instanceVar.someFunc();
        }

        long elapsed = System.currentTimeMillis() - start;

        double avg = (elapsed * 1000.0) / N;

        // System.out.println("elapsed time = " + elapsed + "ms");
        // System.out.println(avg + " microseconds per execution");

        return avg;
    }

    public double testLocalVar() {
        // System.out.println("Running local variable benchmark:");
        instanceVar = new LocalClass();
        LocalClass localVar = instanceVar;

        long start = System.currentTimeMillis();
        for (int i = 0 ; i < N; i++) {
            localVar.someFunc();
        }

        long elapsed = System.currentTimeMillis() - start;

        double avg = (elapsed * 1000.0) / N;

        // System.out.println("elapsed time = " + elapsed + "ms");
        // System.out.println(avg + " microseconds per execution");

        return avg;
    }

    public static void main(String[] args) {
        AccessBenchmark bench;

        double[] avgInstance = new double[M];
        double[] avgLocal = new double[M];

        for (int i = 0; i < M; i++) {
            bench = new AccessBenchmark();

            avgInstance[i] = bench.testInstanceVar();
            avgLocal[i] = bench.testLocalVar();

            System.gc();
        }

        double sumInstance = 0.0;
        for (double d : avgInstance) sumInstance += d;
        System.out.println("Average time for instance variable access: " + sumInstance / M);

        double sumLocal = 0.0;
        for (double d : avgLocal) sumLocal += d;
        System.out.println("Average time for local variable access: " + sumLocal / M);
    }
}

Answer 2

In general, an access to an instance variable (of the this object) requires an aload_0 (to load this to the top of the stack) followed by getfield .通常，访问（ this对象的）实例变量需要aload_0 （将this加载到堆栈顶部），然后是getfield 。 Referencing a local variable requires only the aload_n to pull the value out of its assigned location in the stack.引用局部变量只需要aload_n将值从其在堆栈中分配的位置中拉出。

Further, getfield must reference the class definition to determine where in the class (what offset) the value is stored.此外， getfield必须引用类定义以确定值存储在类中的位置（什么偏移量）。 This could be several additional hardware instructions.这可能是几个额外的硬件指令。

Even with a JITC it's unlikely that the local reference (which would normally be zero/one hardware operation) would ever be slower than the instance field reference (which would have to be at least one operation, maybe 2-3).即使使用 JITC，本地引用（通常为零/一个硬件操作）也不太可能比实例字段引用（至少必须是一个操作，可能是 2-3 个操作）慢。

(Not that this matters all that much -- the speed of both is quite good, and the difference could only become significant in very bizarre circumstances.) （并不是说这一切都那么重要——两者的速度都很好，而且只有在非常奇怪的情况下，差异才会变得显着。）

Answer 3

Like in the comments, I don't think there's difference in the time taken.就像在评论中一样，我认为所用的时间没有区别。 I think what you might be referring to is better exemplified in Java SE codebase.我认为您可能指的是在 Java SE 代码库中得到了更好的例证。 For example, in java.lang.String :例如，在java.lang.String ：

public void getBytes(int srcBegin, int srcEnd, byte dst[], int dstBegin) {
    //some code you can check out

    char[] val = value;
    while (i < n) {
        dst[j++] = (byte)val[i++];    /* avoid getfield opcode */
    }
}

In the above code, value is an instance variable and since there was a while loop where individual elements of value were going to be accessed, they brought it from the heap to the stack ( local variable ) thus optimizing.在上面的代码中， value是一个实例变量，因为有一个while循环，其中value各个元素将被访问，他们将它从堆带到堆栈（局部变量），从而进行优化。

You can also check out knowledge shared by Jon Skeet, Vivin and few others on this answer .您还可以查看 Jon Skeet、Vivin 和其他少数人在此答案上分享的知识。

Answer 4

From a micro architecture perspective, reading a local variable may be cheaper because it's likely in a register or at least in the CPU cache.从微观架构的角度来看，读取局部变量可能更便宜，因为它可能在寄存器中或至少在 CPU 缓存中。 In general reading an instance variable may cause an expensive cache miss.通常，读取实例变量可能会导致代价高昂的缓存未命中。 In this case though the variable was just written, so it will likely be in the cache anyway.在这种情况下，尽管变量刚刚写入，但无论如何它都可能在缓存中。 You could write a micro benchmark to find if there's any difference.您可以编写一个微型基准测试以查看是否有任何差异。

Answer 5

I think using dummy might be at the very most, 1 cycle faster, assuming it was left in a register, but it depends on the specific CPU architecture, and what setNext looks like, and the JVM you're using, and it's really unpredictable how the code might look once in its final JIT'd form.我想使用dummy可能在很大部分，1个周期快，假设它留在寄存器中，但它依赖于特定的CPU架构，什么setNext样子，和JVM您正在使用，并且它是真正不可预测的代码在其最终JIT形式中的外观。 The JVM could potentially see that head == dummy, and if so, the executed code for both cases would be identical. JVM 可能会看到 head == dummy，如果是这样，两种情况下执行的代码将是相同的。 This is much, much too tiny a case to worry about.这是一个非常非常小的案例，无需担心。

Answer 6

I can assure you that whatever performance gains one might gain from this will be offset by the headache of looking at confusingly written code.我可以向你保证，无论你从中获得什么性能提升，都会被令人困惑的代码所带来的头痛所抵消。 Let the compiler figure this out.让编译器解决这个问题。 I will concede that all things being equal, the local variable is probably slightly faster, if only because there are fewer bytecode instructions involved.我承认，在所有条件相同的情况下，局部变量可能会稍微快一点，这仅仅是因为涉及的字节码指令较少。 However, who is to say that future versions of the JVM won't change this?然而，谁又能说 JVM 的未来版本不会改变这一点呢？

In short, write code that is easy to read first.简而言之，首先编写易于阅读的代码。 If, after that, you have a performance concern, profile.如果在那之后，您有性能问题，请配置文件。

Answer 7

When in doubt look at the byte code generated如有疑问，请查看生成的字节码

public void append(java.lang.Object);
 Code:
  0:    new #2; //class ListExample$Node
  3:    dup
  4:    aload_0
  5:    aload_1
  6:    invokespecial   #3; //Method ListExample$Node."<init>":(LListExample;Ljava/lang/Object;)V
  9:    astore_2
  10:   aload_0
  11:   aload_2
  12:   putfield    #4; //Field head:LListExample$Node;
  15:   aload_0
  16:   aload_2
  17:   putfield    #5; //Field tail:LListExample$Node;
  20:   aload_0
  21:   getfield    #4; //Field head:LListExample$Node;
  24:   aload_0
  25:   getfield    #4; //Field head:LListExample$Node;
  28:   invokevirtual   #6; //Method ListExample$Node.setNext:(LListExample$Node;)V
  31:   aload_2
  32:   aload_2
  33:   invokevirtual   #6; //Method ListExample$Node.setNext:(LListExample$Node;)V
  36:   return

}

Either you get aload followed by getfield or 2 x aload.要么得到 aload，然后是 getfield，要么得到 2 x aload。 Seems to me they would be identical..在我看来，它们是相同的..

Java 本地 vs 实例变量访问速度

问题描述

7 个解决方案

解决方案1
22 已采纳 2014-02-06 21:27:19

解决方案2
16 2014-02-06 21:05:14

解决方案3
9 2014-02-06 20:34:28

解决方案4
4 2014-02-06 20:50:03

解决方案5
-1 2014-02-06 20:36:48

解决方案6
-1 2017-12-05 16:52:10

解决方案7
-2 2014-02-06 20:26:44

Java 本地 vs 实例变量访问速度

问题描述

7 个解决方案

解决方案1 22 已采纳 2014-02-06 21:27:19

解决方案2 16 2014-02-06 21:05:14

解决方案3 9 2014-02-06 20:34:28

解决方案4 4 2014-02-06 20:50:03

解决方案5 -1 2014-02-06 20:36:48

解决方案6 -1 2017-12-05 16:52:10

解决方案7 -2 2014-02-06 20:26:44

解决方案1
22 已采纳 2014-02-06 21:27:19

解决方案2
16 2014-02-06 21:05:14

解决方案3
9 2014-02-06 20:34:28

解决方案4
4 2014-02-06 20:50:03

解决方案5
-1 2014-02-06 20:36:48

解决方案6
-1 2017-12-05 16:52:10

解决方案7
-2 2014-02-06 20:26:44