简体   繁体   English

静态调度OOO处理器

[英]Statically scheduling OOO processors

The LLVM MISched instruction scheduler uses declarative TableGen descriptions of the processor functional units, pipelines and latencies. LLVM MISched 指令调度程序使用处理器功能单元、管道和延迟的声明性 TableGen 描述。 Imagine trying to determine the equivalent of the coding guidelines from Intel's Optimization Reference Manual from those declarations.想象一下,试图从这些声明中确定与英特尔优化参考手册中的编码指南等效的情况。

Broadly speaking, what are the objectives/techniques for statically scheduling OOO processors?从广义上讲,静态调度 OOO 处理器的目标/技术是什么? When would it schedule an instruction A before B and when would it schedule A after B for an OOO processor?对于OOO处理器,它什么时候会在B之前调度指令A,什么时候在B之后调度A?

Superscalar processors can execute more than one instruction at a time.超标量处理器一次可以执行多条指令。 An in-order processor will only consider instructions in their original order.有序处理器将只考虑原始顺序中的指令。 An out-of-order (OOO) processor can execute instructions out of order and then commit the results in order.乱序(OOO)处理器可以乱序执行指令,然后按顺序提交结果。 Speculation doesn't matter for this question but I assume these processors are pipelined.推测对于这个问题并不重要,但我认为这些处理器是流水线的。 Think A53 (in-order) and Haswell (OOO).想想 A53(有序)和 Haswell(OOO)。

Which instruction an OOO processor will execute next is a scheduling decision made by the processor at run time. OOO 处理器接下来将执行哪条指令是处理器在运行时做出的调度决策。 So this is usually called dynamic scheduling.所以这通常被称为动态调度。 Which instruction an in-order processor executes was decided by the compiler back when the program was compiled.顺序处理器执行哪条指令是由编译器在编译程序时决定的。 Consequently this is usually called static scheduling.因此,这通常称为 static 调度。

However, compilers also statically target/schedule OOO processors.然而,编译器也静态地定位/调度 OOO 处理器。 In both in-order and OOO cases, a compiler can look at a large window of instructions;在有序和 OOO 情况下,编译器可以查看大量指令 window; a compiler has to deal with register pressure;编译器必须处理寄存器压力; and in both cases, a compiler wants to keep functional units busy.在这两种情况下,编译器都希望使功能单元保持忙碌。 OOO processors typically can also rename registers, reducing register pressure. OOO 处理器通常还可以重命名寄存器,从而减少寄存器压力。

Given that an OOO processor schedules instructions dynamically, what should the ahead of time compiler do to help this?鉴于 OOO 处理器动态调度指令,提前编译器应该做些什么来帮助这一点?

You are generally correct but compile-time scheduling can still slightly improve execution speed.您通常是正确的,但编译时调度仍然可以稍微提高执行速度。 This happens because compiler can rearrange instructions in more optimal way to speed up decoding (older variants of x86 could decode multiple instructions in parallel only if sequence satisfies certain constraints) or to pack them more tightly in processor's instruction buffer.发生这种情况是因为编译器可以以更优化的方式重新排列指令以加快解码速度(x86 的旧变体只有在序列满足某些约束时才能并行解码多条指令)或将它们更紧密地打包在处理器的指令缓冲区中。 To cite from Robert Morgan's "Building an optimizing compiler":引用 Robert Morgan 的“构建优化编译器”:

The compiler should schedule the insns as if the processor were
not an out-of-order execution processor.  The more effective this
schedule is, the larger the size of the effective insns buffer.

The wins are normally quite small in practice (few percent).在实践中,胜利通常很小(百分之几)。

It really isn't a scheduling decision per se but rather an optimization.这实际上不是一个调度决策本身,而是一个优化。 Basically, looking at the passes which LLVM's addILPOpts() adds for OOO superscalar backends gives a good idea of what is possible.基本上,查看 LLVM 的 addILPOpts() 为 OOO 超标量后端添加的通道可以很好地了解什么是可能的。 Early if-conversion is generating code which will run in parallel and avoiding code which must run serially.早期的 if 转换正在生成将并行运行的代码并避免必须串行运行的代码。

LLVM has the EarlyIfConverter pass for OOO superscalars. LLVM 具有适用于 OOO 超标量的 EarlyIfConverter 通行证。 It's used by the PowerPC, X86, AMDGPU, SystemZ and AArch64 backends.它被 PowerPC、X86、AMDGPU、SystemZ 和 AArch64 后端使用。 EarlyIfConverter evaluates two expressions in parallel and inserts a select to choose one: TII->insertSelect(...) EarlyIfConverter 并行计算两个表达式并插入 select 以选择一个: TII->insertSelect(...)

 // Early if-conversion is for out-of-order CPUs that don't have a lot of
 // predicable instructions. The goal is to eliminate conditional branches that
 // may mispredict.
 //
 // Instructions from both sides of the branch are executed speculatively, and a
 // cmov instruction selects the result.

This pass is added by the backend in addILPOpts().此 pass 由后端在 addILPOpts() 中添加。 It is using ILP to evaluate two alternatives in parallel rather than conditionally evaluate one and then the other.使用ILP 并行评估两个备选方案,而不是有条件地评估一个然后另一个。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM