[英]Why do memory instructions take 4 cycles in ARM assembly?
Memory instructions such as ldr
, str
or b
take 4 cycles each in ARM assembly.诸如ldr
、 str
或b
类的内存指令在 ARM 汇编中各需要 4 个周期。
Is it because each memory location is 4 bytes long?是不是因为每个内存位置都是 4 个字节长?
ARM has a pipelined architecture. ARM 具有流水线架构。 Each clock cycle advances the pipeline by one step (eg fetch/decode/execute/read...).每个时钟周期使流水线前进一个步骤(例如获取/解码/执行/读取...)。 Since the pipeline is continuously fed, the overall time to execute each instruction can approach 1 cycle, but the actual time for an individual instruction from 'fetch' through completion can be 3+ cycles.由于流水线是连续馈送的,因此执行每条指令的总时间可能接近 1 个周期,但单个指令从“获取”到完成的实际时间可能是 3+ 个周期。 ARM has a good explanation on their website: ARM 在他们的网站上有很好的解释:
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0222b/ch01s01s01.html http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0222b/ch01s01s01.html
Memory latency adds another layer of complication to this idea.内存延迟为这个想法增加了另一层复杂性。 ARM employs a multi-level cache system which aims to have the most frequently used data available in the fewest cycles. ARM 采用多级缓存系统,旨在以最少的周期获得最常用的数据。 Even a read from the fastest (L0) cache involves several cycles of latency.即使从最快的 (L0) 缓存读取也涉及几个延迟周期。 The pipeline includes facilities to allow read requests to complete at a later time if the data is not used right away.如果数据没有立即使用,管道包括允许读取请求在稍后完成的设施。 It's easier to understand by way of example:举个例子更容易理解:
LDR R0,[R1]
MOV R2,R3 // Allow time for memory read to occur
ADD R4,R4,#200 // by interleaving other instructions
CMP R0,#0 // before trying to use the value
// By trying to access the data immediately, this will cause a pipeline
// 'stall' and waste time waiting for the data to become available.
LDR R0,[R1]
CMP R0,#0 // Wastes at least 1 cycle due to pipeline not having the data
The idea is to hide the inherent latencies in the pipeline and, if you can, hide additional latencies in the memory access by delaying dependencies on registers (aka instruction interleaving).这个想法是隐藏管道中的固有延迟,如果可以的话,通过延迟对寄存器的依赖(又名指令交错)来隐藏内存访问中的额外延迟。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.