简体   繁体   中英

What is processor Lock# signal and how it works?

I' was reading a book about assembly(intermediate level) and it mentioned that some instructions like xchg automatically assert the processor LOCK# signal. Searching online about it revealed that it give the processor the exclusive right of any shared memory and no specific details. Which made me wonder how does this right works.

  1. Does this mean that any other computer device like GPU or something else can't have access to memory for example. Actually can other devices talk directly to RAM without passing first on the CPU.
  2. How does the processor know that it's in this locked state is it saved in a control or rflags register for example or what since I can't see how this operation works when having multicore CPU.
  3. The websites I visited said lock any shared memory . does this mean that during this lock period the whole RAM is locked or just the memory page(or part of memory not all of it) that the instruction is performed on.

The basic problem is that some instructions read memory, modify the value read, then write a new value; and if the contents of memory change between the read and the write, (some) parallel code can end up in an inconsistent state.

A nice example is one CPU doing inc dword [foo] while another CPU does dec dword [foo] . After both instructions (on both CPUs) are executed the value should be the same as it originally was; but both CPUs could read the old value, then both CPUs could modify it, then both CPUs could write their new value; resulting in the value being 1 higher or 1 lower than you'd expect.

The solution was to use a #lock signal to prevent anything else from accessing the same piece of memory at the same time. Eg the first CPU would assert #lock then do it's read/modify/write, then de-assert #lock ; and anything else would see that the #lock is asserted and have to wait until the #lock is de-asserted before it can do any memory access. In other words, it's a simple form of mutual exclusion (like a spinlock, but in hardware).

Of course "everything else has to wait" has a performance cost; so it's mostly only done when explicitly requested by software (eg lock inc dword [foo] and not inc dword [foo] ) but there are a few cases where it's done implicitly - xchg instruction when an operand uses memory, and updates to dirty/accessed/busy flags in some of the tables the CPU uses (for paging, and GDT/LDT/IDT entries). Also; later (Pentium Pro I think?), the behavior was optimized to work with cache coherency protocol so that the #lock isn't asserted if the cache line can be put in the exclusive state instead.

Note: In the past there have been 2 CPU bugs (Intel Pentium "0xF00F" bug and Cyrix "Coma" bug) where a CPU can be tricked into asserting the #lock signal and never de-asserting it; causing the entire system to lock up because nothing can access any memory.

  1. Does this mean that any other computer device like GPU or something else can't have access to memory for example. Actually can other devices talk directly to RAM without passing first on the CPU.

Yes. If the #lock is asserted (which doesn't include cases where newer CPUs can put the cache line into the exclusive state instead); anything that accesses memory would have to wait for #lock to be de-asserted.

Note: Most modern devices can/do access memory directly (to transfer data to/from RAM without using the CPU to transfer data).

  1. How does the processor know that it's in this locked state is it saved in a control or rflags register for example or what since I can't see how this operation works when having multicore CPU.

It's not saved in the contents of any register. It's literally an electronic signal on a bus or link. For an extremely over-simplified example; assume that the bus has 32 "address" wires, 32 "data" wires, plus a #lock wire; where "assert the #lock " means that the voltage on that #lock wire goes from 0 volts up to 3.3 volts. When anything wants to read or write memory (before attempting to change the voltages on the "address" wires or "data" wires) it checks the voltage on the #lock wire is 0 volts.

Note: A real bus is much more complicated and needs a few other signals (eg for direction of transfer, for collision avoidance, for "I/O port or physical memory", etc); and modern buses use serial lanes and not parallel wires; and modern systems use "point to point links" and not "common bus shared by all the things".

  1. The websites I visited said lock any shared memory. does this mean that during this lock period the whole RAM is locked or just the memory page(or part of memory not all of it) that the instruction is performed on.

It's better to say that the bus is locked; where everything has to use the bus to access memory (and nothing else can use the bus when the bus is locked, even when something else is trying to use the bus for something that has nothing to do with memory - eg to send an IRQ to a CPU).

Of course (due to aggressive performance optimizations - primarily the "if the cache line can be put in the exclusive state instead" optimization) it's even better to say that the hardware can do anything it feels like as long as the result behaves as if there's a shared bus that was locked (even if there isn't a shared bus and nothing was actually locked).

Note: 80x86 supports misaligned accesses (eg you can lock inc dword [address] where the access can straddle a boundary), where if a memory access does straddle a boundary the CPU needs to combine 2 or more pieces (eg a few bytes from the end of one cache line and a few bytes from the start of the next cache line). Modern virtual memory means that if the virtual address straddles a page boundary the CPU needs to access 2 different virtual pages which may have "extremely unrelated" physical addresses. If a theoretical CPU tried to implement independent locks (a different lock for each memory area) then it would also need to support asserting multiple lock signals. This can cause deadlocks - eg one CPU locks "memory page 1" then tries to lock "memory page 2" (and can't because it's locked); while another CPU locks "memory page 2" then tries to lock "memory page 1" (and can't because it's locked). To fix that the theoretical CPU would have to use "global lock ordering" - always assert locks in a specific order. The end result would be a significant amount of complexity (where it's likely that the added complexity costs more performance than it saves).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM