简体   繁体   中英

ISB instruction in ARM Cortex M

Untill now I used 3 NOPs in order to "clean" the pipeline. Recently I encountered the ISB instruction that does that for me. Viewing the arm info center I noticed that this command takes 4 cycles (Under Cortex M0) and the 3 NOPs takes only 3.

Why should I use this command? What is it different from the 3 NOPs?

Here's the problem with NOP ( http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0552a/CHDJJGFB.html ):

Operation

NOP performs no operation and is not guaranteed to be time consuming. The processor might remove it from the pipeline before it reaches the execution stage.

Use NOP for padding, for example to place the subsequent instructions on a 64-bit boundary.

The same info is in documentation for other ARM Cortex devices, so using that instruction for any purpose other than padding is not reliable at all. The only guarantee you have is that this instruction will occupy 2 ( nop ) or 4 bytes ( nop.w ) and that it will not perform any operation - nothing more.

You should use the ISB instruction to ensure the pipeline is clear. As the comments above state the pipeline can be different between different ARM processors (for example and M7 has a 6 stage pipleline vs 3 stage for an M3/4). According the the M4 technical reference manual "For ISB, the minimum number of cycles is equivalent to the number required for a pipeline refill.".

Quite why this is 4 cycles and not 3 I am unsure, it is possibly something to do with ensuring the branch prediction logic is correct. Regardless of whether you want your code to be portable or not I would advise using what ARM provides for the job, if they think you need 4 cycles then I expect you do. You might possibly get erroneous operation under some circumstances if you only have 3.

The reason that ISB instruction is 4 cycles is very simple. Cortex-M instruction set is a mixture of 16-bit and 32-bit instructions. There are six 32-bit instructions that are supported in Cortex-M designs (eg Cortex-M0) : BL, MSR, MRS, ISB, DMB, DSB.

All these six instructions can be mixed among 16-bit instructions.

The question is how the processor knows which instruction is 16-bit and which one is 32-bits ? To answer this question the processor reads the first 16-bits and decodes it (1 cycle). if the opcode matches a 32-bit instructions then it knows that the next 16-bit instruction is actually the second half of a 32-bit instruction and tries to execute it (3 cycles).

That makes ALL 32-bit instructions in Cortex-M cores to be 1+3 cycles = 4 cycles.

To flush the pipeline you can use 3 NOPs if you are sure about the core implementation. You must be sure that the core does not have a branch prediction and on the fly instruction optimization which removes consecutive NOPs. If you are sure about the absense of this feature then use 3 NOP instructions and you will save 1 cycle. But if you are not use and you also want your ARM code to be portable to other architectures like ARMv7, etc. Then you must use ISB instruction, which is a 32-bit instruction and takes 4 cycles.

The 3 NOPs are not guaranteed to consume 3 cycles. There are certainly scenarios where they would consume 2 cycles on Cortex-M3 - and in these scenarios you might need to use more than 3 NOPs to get the effect you need.

It's likely that the 'interesting' scenarios won't occur in common code, or there are specific timings of other events also required - so you are probably unlikely to observe them. The critical point is that there is no guarantee, and observability of this sort of failure is often low.

Even if you somewhere only use 2 NOPs by mistake, your code probably works most of the time - until maybe a change somewhere else has an impact on alignment and exposes a failure.

Definitive info on "cleaning" the pipeline is here:

"ARM Cortex™-M Programming Guide to Memory Barrier Instructions" Application Note 321

It applies to: Cortex-M3, Cortex-M4, Cortex-M0, Cortex-M0+, and Cortex-M1 processors.

How to clean the pipeline depends on what instructions are being used, eg, __enable_irq(), __WFI() (sleep), etc.

From what I can tell, that doc does not mention the use of NOP to clean the pipeline.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM