为什么JVM不在Windows x86上发出预取指令

Question

As the title states, why doesn't the OpenJDK JVM emit prefetch instruction on Windows x86? 正如标题所述，为什么OpenJDK JVM不会在Windows x86上发出预取指令？ See OpenJDK Mercurial @ http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/file/c49dcaf78a65/src/os_cpu/windows_x86/vm/prefetch_windows_x86.inline.hpp 请参阅OpenJDK Mercurial @http： //hg.openjdk.java.net/jdk8u/jdk8u/hotspot/file/c49dcaf78a65/src/os_cpu/windows_x86/vm/prefetch_windows_x86.inline.hpp

inline void Prefetch::read (void *loc, intx interval) {}
inline void Prefetch::write(void *loc, intx interval) {}

There are no comments and I've found no other resources besides the source code. 没有评论，我发现除了源代码之外没有其他资源。 I am asking because it does so for Linux x86, see http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/file/c49dcaf78a65/src/os_cpu/linux_x86/vm/prefetch_linux_x86.inline.hpp 我问，因为它对Linux x86这样做，请参阅http://hg.openjdk.java.net/jdk8u/jdk8u/hotspot/file/c49dcaf78a65/src/os_cpu/linux_x86/vm/prefetch_linux_x86.inline.hpp

inline void Prefetch::read (void *loc, intx interval) {
#ifdef AMD64
  __asm__ ("prefetcht0 (%0,%1,1)" : : "r" (loc), "r" (interval));
#endif // AMD64
}

inline void Prefetch::write(void *loc, intx interval) {
#ifdef AMD64

  // Do not use the 3dnow prefetchw instruction.  It isn't supported on em64t.
  //  __asm__ ("prefetchw (%0,%1,1)" : : "r" (loc), "r" (interval));
  __asm__ ("prefetcht0 (%0,%1,1)" : : "r" (loc), "r" (interval));

#endif // AMD64
}

Answer 1

As JDK-4453409 indicates, prefetching was implemented in HotSpot JVM in JDK 1.4 to speed-up GC. 正如JDK-4453409所示，在JDK 1.4中的HotSpot JVM中实现了预取，以加速GC。 That was more than 15 years ago, no one will remember now why it was not implemented on Windows. 那是超过15年前，没有人会记得为什么它没有在Windows上实现。 My guess is that Visual Studio (which has always been used to build HotSpot on Windows) basically didn't understand prefetch instruction at these times. 我的猜测是Visual Studio（它一直用于在Windows上构建HotSpot）在这些时候基本上不理解预取指令。 Looks like a place for improvement. 看起来像是一个改进的地方。

Anyway, the code you've asked about is used internally by JVM Garbage Collector. 无论如何，您询问的代码是由JVM垃圾收集器在内部使用的。 This is not what JIT generates. 这不是JIT生成的。 C2 JIT code generator rules are in the architecture definition file x86_64.ad , and there are rules to translate PrefetchRead , PrefetchWrite and PrefetchAllocation nodes to the corresponding x64 instructions. C2 JIT码发生器规则是在架构定义文件x86_64.ad ，并有规则来翻译PrefetchRead ， PrefetchWrite和PrefetchAllocation节点到相应的64位的指令。

An insteresting fact is that PrefetchRead and PrefetchWrite nodes are not created anywhere in the code. 一个有趣的事实是， PrefetchRead和PrefetchWrite节点不会在代码中的任何位置创建。 They exist only to support Unsafe.prefetchX intrinsics, however, they are removed in JDK 9. 它们仅用于支持Unsafe.prefetchX内在函数，但是它们在JDK 9中被删除。

The only case when JIT generates prefetch instruction is PrefetchAllocation node. JIT生成预取指令的唯一情况是PrefetchAllocation节点。 You can verify with -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly that PREFETCHNTA is indeed generated after object allocation, both on Linux and Windows . 您可以使用-XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly验证， 在Linux和Windows上 ，确实在对象分配-XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly PREFETCHNTA 。

class Test {
    public static void main(String[] args) {
        byte[] b = new byte[0];
        for (;;) {
            b = Arrays.copyOf(b, b.length + 1);
        }
    }
}

java.exe -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly Test

# {method} {0x00000000176124e0} 'main' '([Ljava/lang/String;)V' in 'Test'
  ...
  0x000000000340e512: cmp    $0x100000,%r11d
  0x000000000340e519: ja     0x000000000340e60f
  0x000000000340e51f: movslq 0x24(%rsp),%r10
  0x000000000340e524: add    $0x1,%r10
  0x000000000340e528: add    $0x17,%r10
  0x000000000340e52c: mov    %r10,%r8
  0x000000000340e52f: and    $0xfffffffffffffff8,%r8
  0x000000000340e533: cmp    $0x100000,%r11d
  0x000000000340e53a: ja     0x000000000340e496
  0x000000000340e540: mov    0x60(%r15),%rbp
  0x000000000340e544: mov    %rbp,%r9
  0x000000000340e547: add    %r8,%r9
  0x000000000340e54a: cmp    0x70(%r15),%r9
  0x000000000340e54e: jae    0x000000000340e496
  0x000000000340e554: mov    %r9,0x60(%r15)
  0x000000000340e558: prefetchnta 0xc0(%r9)
  0x000000000340e560: movq   $0x1,0x0(%rbp)
  0x000000000340e568: prefetchnta 0x100(%r9)
  0x000000000340e570: movl   $0x200000f5,0x8(%rbp)  ;   {metadata({type array byte})}
  0x000000000340e577: mov    %r11d,0xc(%rbp)
  0x000000000340e57b: prefetchnta 0x140(%r9)
  0x000000000340e583: prefetchnta 0x180(%r9)    ;*newarray
                                                ; - java.util.Arrays::copyOf@1 (line 3236)
                                                ; - Test::main@9 (line 9)

Answer 2

The files you cited all has asm code fragment ( inline assembler ), which is used by some C/C++ software in its own code (as apangin, the JVM expert pointed , mostly in GC code). 你引用的文件都有asm代码片段（内联汇编程序），某些C / C ++软件在自己的代码中使用它（如apangin，JVM专家指出，主要是在GC代码中）。 And there is actually the difference: Linux , Solaris and BSD variants of x86_64 hotspot have prefetches in the hotspot and windows has them disabled/unimplemented which is partially strange, partially unexplainable why, and it may also make JVM bit (some percents; more on platforms without hardware prefetch) slower on Windows, but still will not help to sell more solaris/solaris paid support contracts for Sun/Oracle. 实际上存在差异：x86_64热点的Linux ， Solaris和BSD变体在热点中都有预取，而Windows则禁用/未实现，这是部分奇怪的，部分无法解释的原因，它也可能使JVM位（某些百分比;更多）没有硬件预取的平台）在Windows上速度较慢，但仍然无法为Sun / Oracle销售更多solaris / solaris付费支持合同。 Ross also guessed that inline asm syntax may be not supported with MS C++ compiler, but _mm_prefetch should (Who will open JDK bug to add it to the file ?). Ross还猜测 MS C ++编译器可能不支持内联asm语法，但_mm_prefetch应该（谁将打开JDK bug将其添加到文件中？）。

JVM hotspot is JIT, and the JITted code is emitted (generated) by JIT as bytes (while it is possible for JIT to copy code from its own functions into generated code or to emit call to the support functions, prefetches are emitted as bytes in hotspot). JVM热点是JIT，JIT由JIT作为字节发出（生成）（虽然JIT可以将代码从其自己的函数复制到生成的代码中或者发出对支持函数的调用，但预取是作为字节发出的。热点）。 How can we find how it is emitted? 我们怎样才能发现它是如何排放的？ Simple online way is to find some online searchable copy of jdk8u (or better in cross-reference like metager ), for example on github: https://github.com/JetBrains/jdk8u_hotspot and do the search of prefetch or prefetch emit or prefetchr or lir_prefetchr . 简单的在线方式是找到一些在线可搜索的jdk8u副本（或更好的交叉引用，如metager ），例如在github： https ： //github.com/JetBrains/jdk8u_hotspot上，并搜索prefetch或prefetch emit或prefetchr或者lir_prefetchr 。 There are some relevant results: 有一些相关的结果：

Actual bytes emitted in JVM's c1 compiler / LIR in jdk8u_hotspot/src/cpu/x86/vm/assembler_x86.cpp : 在jdk8u_hotspot/src/cpu/x86/vm/assembler_x86.cpp中JVM的c1编译器 / LIR中发出的实际字节：

void Assembler::prefetch_prefix(Address src) {
  prefix(src);
  emit_int8(0x0F);
}

void Assembler::prefetchnta(Address src) {
  NOT_LP64(assert(VM_Version::supports_sse(), "must support"));
  InstructionMark im(this);
  prefetch_prefix(src);
  emit_int8(0x18);
  emit_operand(rax, src); // 0, src
}

void Assembler::prefetchr(Address src) {
  assert(VM_Version::supports_3dnow_prefetch(), "must support");
  InstructionMark im(this);
  prefetch_prefix(src);
  emit_int8(0x0D);
  emit_operand(rax, src); // 0, src
}

void Assembler::prefetcht0(Address src) {
  NOT_LP64(assert(VM_Version::supports_sse(), "must support"));
  InstructionMark im(this);
  prefetch_prefix(src);
  emit_int8(0x18);
  emit_operand(rcx, src); // 1, src
}

void Assembler::prefetcht1(Address src) {
  NOT_LP64(assert(VM_Version::supports_sse(), "must support"));
  InstructionMark im(this);
  prefetch_prefix(src);
  emit_int8(0x18);
  emit_operand(rdx, src); // 2, src
}

void Assembler::prefetcht2(Address src) {
  NOT_LP64(assert(VM_Version::supports_sse(), "must support"));
  InstructionMark im(this);
  prefetch_prefix(src);
  emit_int8(0x18);
  emit_operand(rbx, src); // 3, src
}

void Assembler::prefetchw(Address src) {
  assert(VM_Version::supports_3dnow_prefetch(), "must support");
  InstructionMark im(this);
  prefetch_prefix(src);
  emit_int8(0x0D);
  emit_operand(rcx, src); // 1, src
}

Usage in c1 LIR: src/share/vm/c1/c1_LIRAssembler.cpp 用于c1 LIR： src/share/vm/c1/c1_LIRAssembler.cpp

void LIR_Assembler::emit_op1(LIR_Op1* op) {
  switch (op->code()) { 
...
    case lir_prefetchr:
      prefetchr(op->in_opr());
      break;

    case lir_prefetchw:
      prefetchw(op->in_opr());
      break;

Now we know the opcode lir_prefetchr and can search for it or in OpenGrok xref and lir_prefetchw , to find the only example in src/share/vm/c1/c1_LIR.cpp 现在我们知道操作码lir_prefetchr并可以搜索它或者在OpenGrok xref和lir_prefetchw中查找src/share/vm/c1/c1_LIR.cpp 的唯一示例

void LIR_List::prefetch(LIR_Address* addr, bool is_store) {
  append(new LIR_Op1(
            is_store ? lir_prefetchw : lir_prefetchr,
            LIR_OprFact::address(addr)));
}

There are other place where prefetch instructions are defined (for C2, as noted by apangin ), the src/cpu/x86/vm/x86_64.ad : 还有其他地方可以定义预取指令（对于C2，如apangin所述）， src/cpu/x86/vm/x86_64.ad ：

// Prefetch instructions. ...
instruct prefetchr( memory mem ) %{
  predicate(ReadPrefetchInstr==3);
  match(PrefetchRead mem);
  ins_cost(125);

  format %{ "PREFETCHR $mem\t# Prefetch into level 1 cache" %}
  ins_encode %{
    __ prefetchr($mem$$Address);
  %}
  ins_pipe(ialu_mem);
%}

instruct prefetchrNTA( memory mem ) %{
  predicate(ReadPrefetchInstr==0);
  match(PrefetchRead mem);
  ins_cost(125);

  format %{ "PREFETCHNTA $mem\t# Prefetch into non-temporal cache for read" %}
  ins_encode %{
    __ prefetchnta($mem$$Address);
  %}
  ins_pipe(ialu_mem);
%}

instruct prefetchrT0( memory mem ) %{
  predicate(ReadPrefetchInstr==1);
  match(PrefetchRead mem);
  ins_cost(125);

  format %{ "PREFETCHT0 $mem\t# prefetch into L1 and L2 caches for read" %}
  ins_encode %{
    __ prefetcht0($mem$$Address);
  %}
  ins_pipe(ialu_mem);
%}

instruct prefetchrT2( memory mem ) %{
  predicate(ReadPrefetchInstr==2);
  match(PrefetchRead mem);
  ins_cost(125);

  format %{ "PREFETCHT2 $mem\t# prefetch into L2 caches for read" %}
  ins_encode %{
    __ prefetcht2($mem$$Address);
  %}
  ins_pipe(ialu_mem);
%}

instruct prefetchwNTA( memory mem ) %{
  match(PrefetchWrite mem);
  ins_cost(125);

  format %{ "PREFETCHNTA $mem\t# Prefetch to non-temporal cache for write" %}
  ins_encode %{
    __ prefetchnta($mem$$Address);
  %}
  ins_pipe(ialu_mem);
%}

// Prefetch instructions for allocation.

instruct prefetchAlloc( memory mem ) %{
  predicate(AllocatePrefetchInstr==3);
  match(PrefetchAllocation mem);
  ins_cost(125);

  format %{ "PREFETCHW $mem\t# Prefetch allocation into level 1 cache and mark modified" %}
  ins_encode %{
    __ prefetchw($mem$$Address);
  %}
  ins_pipe(ialu_mem);
%}

instruct prefetchAllocNTA( memory mem ) %{
  predicate(AllocatePrefetchInstr==0);
  match(PrefetchAllocation mem);
  ins_cost(125);

  format %{ "PREFETCHNTA $mem\t# Prefetch allocation to non-temporal cache for write" %}
  ins_encode %{
    __ prefetchnta($mem$$Address);
  %}
  ins_pipe(ialu_mem);
%}

instruct prefetchAllocT0( memory mem ) %{
  predicate(AllocatePrefetchInstr==1);
  match(PrefetchAllocation mem);
  ins_cost(125);

  format %{ "PREFETCHT0 $mem\t# Prefetch allocation to level 1 and 2 caches for write" %}
  ins_encode %{
    __ prefetcht0($mem$$Address);
  %}
  ins_pipe(ialu_mem);
%}

instruct prefetchAllocT2( memory mem ) %{
  predicate(AllocatePrefetchInstr==2);
  match(PrefetchAllocation mem);
  ins_cost(125);

  format %{ "PREFETCHT2 $mem\t# Prefetch allocation to level 2 cache for write" %}
  ins_encode %{
    __ prefetcht2($mem$$Address);
  %}
  ins_pipe(ialu_mem);
%}

为什么JVM不在Windows x86上发出预取指令

问题描述

2 个解决方案

解决方案1
8 2017-06-04 18:20:06

解决方案2
6 已采纳 2017-06-04 16:21:14

为什么JVM不在Windows x86上发出预取指令

问题描述

2 个解决方案

解决方案1 8 2017-06-04 18:20:06

解决方案2 6 已采纳 2017-06-04 16:21:14

解决方案1
8 2017-06-04 18:20:06

解决方案2
6 已采纳 2017-06-04 16:21:14