简体   繁体   English

为什么内存指令在 ARM 汇编中需要 4 个周期?

[英]Why do memory instructions take 4 cycles in ARM assembly?

Memory instructions such as ldr , str or b take 4 cycles each in ARM assembly.诸如ldrstrb类的内存指令在 ARM 汇编中各需要 4 个周期。

Is it because each memory location is 4 bytes long?是不是因为每个内存位置都是 4 个字节长?

ARM has a pipelined architecture. ARM 具有流水线架构。 Each clock cycle advances the pipeline by one step (eg fetch/decode/execute/read...).每个时钟周期使流水线前进一个步骤(例如获取/解码/执行/读取...)。 Since the pipeline is continuously fed, the overall time to execute each instruction can approach 1 cycle, but the actual time for an individual instruction from 'fetch' through completion can be 3+ cycles.由于流水线是连续馈送的,因此执行每条指令的总时间可能接近 1 个周期,但单个指令从“获取”到完成的实际时间可能是 3+ 个周期。 ARM has a good explanation on their website: ARM 在他们的网站上有很好的解释:

http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0222b/ch01s01s01.html http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0222b/ch01s01s01.html

Memory latency adds another layer of complication to this idea.内存延迟为这个想法增加了另一层复杂性。 ARM employs a multi-level cache system which aims to have the most frequently used data available in the fewest cycles. ARM 采用多级缓存系统,旨在以最少的周期获得最常用的数据。 Even a read from the fastest (L0) cache involves several cycles of latency.即使从最快的 (L0) 缓存读取也涉及几个延迟周期。 The pipeline includes facilities to allow read requests to complete at a later time if the data is not used right away.如果数据没有立即使用,管道包括允许读取请求在稍后完成的设施。 It's easier to understand by way of example:举个例子更容易理解:

LDR R0,[R1]
MOV R2,R3    // Allow time for memory read to occur
ADD R4,R4,#200  // by interleaving other instructions
CMP R0,#0  // before trying to use the value

// By trying to access the data immediately, this will cause a pipeline
// 'stall' and waste time waiting for the data to become available.
LDR R0,[R1]
CMP R0,#0 // Wastes at least 1 cycle due to pipeline not having the data

The idea is to hide the inherent latencies in the pipeline and, if you can, hide additional latencies in the memory access by delaying dependencies on registers (aka instruction interleaving).这个想法是隐藏管道中的固有延迟,如果可以的话,通过延迟对寄存器的依赖(又名指令交错)来隐藏内存访问中的额外延迟。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 关于ARM汇编的浮点指令 - Float point instructions on ARM assembly Perf启动开销:为什么执行MOV + SYS_exit的简单静态可执行文件有如此多的停顿周期(和指令)? - Perf startup overhead: Why does a simple static executable which performs MOV + SYS_exit have so many stalled cycles (and instructions)? 对于ARM,为什么单条STM指令一般比多条STR指令快? - For ARM, why a single STM instruction is generally faster than multiple STR instructions? 为什么 InputStream 不占用更多的 memory? - Why InputStream does not take more memory? 汇编:计算指令的执行时间 - Assembly: Compute Execution Time of Instructions 为什么引入无用的MOV指令会加速x86_64汇编中的紧凑循环? - Why would introducing useless MOV instructions speed up a tight loop in x86_64 assembly? 连续将IPC(指令/周期)传递给其他功能或变量 - Passing IPC(Instructions/Cycles) continuously to other function or variable 如何计算在ARM程序中执行的指令数? - How to count the number of instructions executed in an ARM program? 从汇编源代码计算时钟周期? - Count clock cycles from assembly source code? 为什么构建Symfony响应需要这么长时间,使用如此多的内存,我该怎么办呢? - Why does constructing a Symfony Response take so long, use so much memory, and what can I do about it?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM