简体   繁体   English

为什么访问 volatile 变量比 member 慢 100 左右?

[英]Why access volatile variable is about 100 slower than member?

Here I wrote a test about access speed of local, member, volatile member:这里我写了一个关于local、member、volatile成员访问速度的测试:

public class VolatileTest {

public int member = -100;

public volatile int volatileMember = -100;

public static void main(String[] args) {
    int testloop = 10;
    for (int i = 1; i <= testloop; i++) {
        System.out.println("Round:" + i);
        VolatileTest vt = new VolatileTest();
        vt.runTest();
        System.out.println();
    }
}

public void runTest() {
    int local = -100;

    int loop = 1;
    int loop2 = Integer.MAX_VALUE;
    long startTime;

    startTime = System.currentTimeMillis();
    for (int i = 0; i < loop; i++) {
        for (int j = 0; j < loop2; j++) {
        }
        for (int j = 0; j < loop2; j++) {
        }
    }
    System.out.println("Empty:" + (System.currentTimeMillis() - startTime));

    startTime = System.currentTimeMillis();
    for (int i = 0; i < loop; i++) {
        for (int j = 0; j < loop2; j++) {
            local++;
        }
        for (int j = 0; j < loop2; j++) {
            local--;
        }
    }
    System.out.println("Local:" + (System.currentTimeMillis() - startTime));

    startTime = System.currentTimeMillis();
    for (int i = 0; i < loop; i++) {
        for (int j = 0; j < loop2; j++) {
            member++;
        }
        for (int j = 0; j < loop2; j++) {
            member--;
        }
    }
    System.out.println("Member:" + (System.currentTimeMillis() - startTime));

    startTime = System.currentTimeMillis();
    for (int i = 0; i < loop; i++) {
        for (int j = 0; j < loop2; j++) {
            volatileMember++;
        }
        for (int j = 0; j < loop2; j++) {
            volatileMember--;
        }
    }
    System.out.println("VMember:" + (System.currentTimeMillis() - startTime));

}
}

And here is a result on my X220 (I5 CPU):这是我的 X220(I5 CPU)上的结果:

Round:1 Empty:5 Local:10 Member:312 VMember:33378 Round:1空车:5 Local:10 Member:312 VMember:33378

Round:2 Empty:31 Local:0 Member:294 VMember:33180轮数:2空车数:31 本地数:0 会员数:294 V会员数:33180

Round:3 Empty:0 Local:0 Member:306 VMember:33085轮次:3空车:0 本地:0 会员:306 VM会员:33085

Round:4 Empty:0 Local:0 Member:300 VMember:33066回合数:4空数:0 本地数:0 会员数:300 虚拟会员数:33066

Round:5 Empty:0 Local:0 Member:303 VMember:33078轮次:5空车:0 本地:0 会员:303 VM会员:33078

Round:6 Empty:0 Local:0 Member:299 VMember:33398轮数:6空车数:0 本地数:0 会员数:299 虚拟会员数:33398

Round:7 Empty:0 Local:0 Member:305 VMember:33139轮数:7空车数:0 本地数:0 会员数:305 虚拟会员数:33139

Round:8 Empty:0 Local:0 Member:307 VMember:33490轮次:8空车:0 本地:0 会员:307 VM会员:33490

Round:9 Empty:0 Local:0 Member:350 VMember:35291轮次:9空车:0 本地:0 会员:350 VM会员:35291

Round:10 Empty:0 Local:0 Member:332 VMember:33838轮次:10空车:0 本地:0 会员:332 VM会员:33838

It surprised me that access to volatile member is 100 times slower than normal member.令我惊讶的是,访问 volatile 成员比普通成员慢 100 倍。 I know there is some highlight feature about volatile member, such as a modification to it will be visible for all thread immediately, access point to volatile variable plays a role of "memory barrier".我知道关于 volatile 成员有一些突出的特性,比如对它的修改将立即对所有线程可见,对 volatile 变量的访问点起着“内存屏障”的作用。 But can all these side effect be the main cause of 100 times slow?但是所有这些副作用会是慢 100 倍的主要原因吗?

PS: I also did a test on a Core II CPU machine. PS:我也在Core II CPU机器上做过测试。 It is about 9:50, about 5 times slow.大约是9:50,慢了大约5倍。 seems like this is also related to CPU arch.似乎这也与 CPU 架构有关。 5 times is still big, right? 5倍还是很大吧?

The volatile members are never cached, so they are read directly from the main memory. volatile 成员从不被缓存,所以它们直接从主内存中读取。

Access to a volatile variable prevents the CPU from re-ordering the instructions before and after the access, and this generally slows down execution.访问volatile变量会阻止 CPU 在访问前后对指令进行重新排序,这通常会减慢执行速度。

Acess to volatile prevents some JIT optimisaton.访问volatile会阻止一些 JIT 优化。 This is especially important if you have a loop which doesn't really do anything as the JIT can optimise such loops away (unless you have a volatile field) If you run the loops "long" the descrepancy should increase more.如果您有一个实际上不执行任何操作的循环,这一点尤其重要,因为 JIT 可以优化此类循环(除非您有一个 volatile 字段)如果您“长时间”运行循环,差异应该会增加更多。

In more realistic test, you might expect volatile to take between 30% and 10x slower for cirtical code.在更实际的测试中,您可能预计volatile对关键代码的处理速度要慢 30% 到 10 倍。 In most real programs it makes very little difference because the CPU is smart enough to "realise" that only one core is using the volatile field and cache it rather than using main memory.在大多数实际程序中,它几乎没有什么区别,因为 CPU 足够聪明,可以“意识到”只有一个内核在使用易失性字段并缓存它而不是使用主内存。

Using volatile will read from the memory directly so that every core of cpu will get the change at next get from the variable, there's no cpu cache used, which will not use register, L1~L3 cache tech, reading from使用volatile会直接从内存中读取,这样cpu的每个核心都会在下一次get时从变量中获取变化,没有使用cpu缓存,不会使用寄存器,L1~L3缓存技术,读取自

  1. register 1 clock cycle寄存器 1 个时钟周期
  2. L1 cache 4 clock cycle一级缓存 4 个时钟周期
  3. L2 cache 11 clock cycle二级缓存 11 个时钟周期
  4. L3 cache 30~40 clock cycle L3缓存30~40个时钟周期
  5. Memory 100+ clock cycle内存 100+ 时钟周期

That's why your result is about 100 times slower when using volatile.这就是为什么使用 volatile 时你的结果会慢大约 100 倍。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM