简体繁体 English

什么是处理器 Lock# 信号以及它是如何工作的？

[英]What is processor Lock# signal and how it works?

原文 2021-01-12 05:42:28 8 1 assembly/ x86/ atomic/ cpu-architecture/ x86-16

I' was reading a book about assembly(intermediate level) and it mentioned that some instructions like xchg automatically assert the processor LOCK# signal.我正在阅读一本关于汇编（中级）的书，它提到像xchg这样的一些指令会自动断言处理器 LOCK# 信号。 Searching online about it revealed that it give the processor the exclusive right of any shared memory and no specific details.在网上搜索它发现它赋予处理器任何共享 memory 的专有权，但没有具体细节。 Which made me wonder how does this right works.这让我想知道这项权利是如何运作的。

Does this mean that any other computer device like GPU or something else can't have access to memory for example.这是否意味着任何其他计算机设备（例如 GPU 或其他设备）都无法访问 memory。 Actually can other devices talk directly to RAM without passing first on the CPU.实际上，其他设备可以直接与 RAM 对话，而无需先通过 CPU。
How does the processor know that it's in this locked state is it saved in a control or rflags register for example or what since I can't see how this operation works when having multicore CPU.处理器如何知道它在这个锁定的 state 中是否保存在控制或 rflags 寄存器中，或者因为我看不到在拥有多核 CPU 时这个操作是如何工作的。
The websites I visited said lock any shared memory .我访问的网站说锁定任何共享 memory 。 does this mean that during this lock period the whole RAM is locked or just the memory page(or part of memory not all of it) that the instruction is performed on.这是否意味着在此锁定期间整个 RAM 被锁定或仅锁定执行指令的 memory 页面（或 memory 的一部分而不是全部）。

1 个解决方案

The basic problem is that some instructions read memory, modify the value read, then write a new value;基本问题是有些指令读取memory，修改读取的值，然后写入新的值； and if the contents of memory change between the read and the write, (some) parallel code can end up in an inconsistent state.如果 memory 的内容在读取和写入之间发生变化，（某些）并行代码可能会导致 state 不一致。

A nice example is one CPU doing inc dword [foo] while another CPU does dec dword [foo] .一个很好的例子是一个 CPU 执行inc dword [foo]而另一个 CPU 执行dec dword [foo] 。 After both instructions (on both CPUs) are executed the value should be the same as it originally was;两条指令（在两个 CPU 上）都执行后，值应与原来相同； but both CPUs could read the old value, then both CPUs could modify it, then both CPUs could write their new value;但是两个 CPU 都可以读取旧值，然后两个 CPU 都可以修改它，然后两个 CPU 都可以写入它们的新值； resulting in the value being 1 higher or 1 lower than you'd expect.导致该值比您预期的高 1 或低 1。

The solution was to use a #lock signal to prevent anything else from accessing the same piece of memory at the same time.解决方案是使用#lock信号来防止其他任何东西同时访问同一块 memory。 Eg the first CPU would assert #lock then do it's read/modify/write, then de-assert #lock ;例如，第一个 CPU 将断言#lock然后执行它的读取/修改/写入，然后取消断言#lock ； and anything else would see that the #lock is asserted and have to wait until the #lock is de-asserted before it can do any memory access.并且其他任何东西都会看到#lock被断言并且必须等到#lock被取消断言才能执行任何memory访问。 In other words, it's a simple form of mutual exclusion (like a spinlock, but in hardware).换句话说，它是一种简单的互斥形式（如自旋锁，但在硬件中）。

Of course "everything else has to wait" has a performance cost;当然，“其他一切都必须等待”有性能成本； so it's mostly only done when explicitly requested by software (eg lock inc dword [foo] and not inc dword [foo] ) but there are a few cases where it's done implicitly - xchg instruction when an operand uses memory, and updates to dirty/accessed/busy flags in some of the tables the CPU uses (for paging, and GDT/LDT/IDT entries).所以它主要只在软件明确请求时完成（例如lock inc dword [foo]而不是inc dword [foo] ），但在少数情况下它是隐式完成的 - 当操作数使用 memory 时的xchg指令，并更新到dirty/ CPU 使用的某些表中的已访问/忙碌标志（用于分页和 GDT/LDT/IDT 条目）。 Also;还; later (Pentium Pro I think?), the behavior was optimized to work with cache coherency protocol so that the #lock isn't asserted if the cache line can be put in the exclusive state instead.后来（我认为是 Pentium Pro？），该行为被优化为与缓存一致性协议一起使用，因此如果缓存行可以放在专用的#lock中，则不会断言 #lock 。

Note: In the past there have been 2 CPU bugs (Intel Pentium "0xF00F" bug and Cyrix "Coma" bug) where a CPU can be tricked into asserting the #lock signal and never de-asserting it;注意：过去有 2 个 CPU 错误（Intel Pentium "0xF00F" 错误和 Cyrix "Coma" 错误），其中一个 CPU 可以被欺骗断言#lock信号并且从不取消断言； causing the entire system to lock up because nothing can access any memory.导致整个系统锁定，因为没有任何东西可以访问任何 memory。

Does this mean that any other computer device like GPU or something else can't have access to memory for example.这是否意味着任何其他计算机设备（例如 GPU 或其他设备）都无法访问 memory。 Actually can other devices talk directly to RAM without passing first on the CPU.实际上，其他设备可以直接与 RAM 对话，而无需先通过 CPU。

Yes.是的。 If the #lock is asserted (which doesn't include cases where newer CPUs can put the cache line into the exclusive state instead);如果#lock被断言（不包括较新的 CPU 可以将缓存线放入独占 state 的情况）； anything that accesses memory would have to wait for #lock to be de-asserted.任何访问 memory 的东西都必须等待#lock被取消断言。

Note: Most modern devices can/do access memory directly (to transfer data to/from RAM without using the CPU to transfer data).注意：大多数现代设备可以/确实直接访问 memory（在不使用 CPU 传输数据的情况下将数据传输到 RAM 或从 RAM 传输数据）。

How does the processor know that it's in this locked state is it saved in a control or rflags register for example or what since I can't see how this operation works when having multicore CPU.处理器如何知道它在这个锁定的 state 中是否保存在控制或 rflags 寄存器中，或者因为我看不到在拥有多核 CPU 时这个操作是如何工作的。

It's not saved in the contents of any register.它不保存在任何寄存器的内容中。 It's literally an electronic signal on a bus or link.它实际上是公共汽车或链路上的电子信号。 For an extremely over-simplified example;举一个极其简单的例子； assume that the bus has 32 "address" wires, 32 "data" wires, plus a #lock wire;假设总线有 32 条“地址”线，32 条“数据”线，加上#lock线； where "assert the #lock " means that the voltage on that #lock wire goes from 0 volts up to 3.3 volts.其中“断言#lock ”表示该#lock线上的电压从0 伏到3.3 伏。 When anything wants to read or write memory (before attempting to change the voltages on the "address" wires or "data" wires) it checks the voltage on the #lock wire is 0 volts.当任何东西想要读取或写入 memory（在尝试更改“地址”线或“数据”线上的电压之前）时，它会检查#lock线上的电压是否为 0 伏。

Note: A real bus is much more complicated and needs a few other signals (eg for direction of transfer, for collision avoidance, for "I/O port or physical memory", etc);注意：真正的总线要复杂得多，需要一些其他信号（例如传输方向、避免冲突、“I/O 端口或物理内存”等）； and modern buses use serial lanes and not parallel wires;现代公共汽车使用串行通道而不是并行线； and modern systems use "point to point links" and not "common bus shared by all the things".现代系统使用“点对点链接”而不是“所有事物共享的公共总线”。

The websites I visited said lock any shared memory.我访问的网站说锁定任何共享的 memory。 does this mean that during this lock period the whole RAM is locked or just the memory page(or part of memory not all of it) that the instruction is performed on.这是否意味着在此锁定期间整个 RAM 被锁定或仅锁定执行指令的 memory 页面（或 memory 的一部分而不是全部）。

It's better to say that the bus is locked;不如说公交车上锁了； where everything has to use the bus to access memory (and nothing else can use the bus when the bus is locked, even when something else is trying to use the bus for something that has nothing to do with memory - eg to send an IRQ to a CPU).一切都必须使用总线来访问 memory （当总线被锁定时，没有其他东西可以使用总线，即使其他东西试图将总线用于与 memory 无关的事情 - 例如发送一个 IRQ 到CPU）。

Of course (due to aggressive performance optimizations - primarily the "if the cache line can be put in the exclusive state instead" optimization) it's even better to say that the hardware can do anything it feels like as long as the result behaves as if there's a shared bus that was locked (even if there isn't a shared bus and nothing was actually locked).当然（由于积极的性能优化 - 主要是“如果缓存行可以放在专有的 state 中”优化）更好的是说硬件可以做任何感觉只要结果表现得好像有被锁定的共享总线（即使没有共享总线并且实际上没有任何东西被锁定）。

Note: 80x86 supports misaligned accesses (eg you can lock inc dword [address] where the access can straddle a boundary), where if a memory access does straddle a boundary the CPU needs to combine 2 or more pieces (eg a few bytes from the end of one cache line and a few bytes from the start of the next cache line).注意：80x86 支持未对齐的访问（例如，您可以lock inc dword [address] ，其中访问可以跨越边界），如果 memory 访问确实跨越边界，则 CPU 需要组合 2 个或更多片段（例如，从一个高速缓存行的结尾和从下一个高速缓存行开始的几个字节）。 Modern virtual memory means that if the virtual address straddles a page boundary the CPU needs to access 2 different virtual pages which may have "extremely unrelated" physical addresses.现代虚拟 memory 意味着如果虚拟地址跨越页面边界，则 CPU 需要访问 2 个不同的虚拟页面，这些页面可能具有“极其不相关”的物理地址。 If a theoretical CPU tried to implement independent locks (a different lock for each memory area) then it would also need to support asserting multiple lock signals.如果理论上的 CPU 尝试实现独立锁（每个 memory 区域的不同锁），那么它还需要支持断言多个锁信号。 This can cause deadlocks - eg one CPU locks "memory page 1" then tries to lock "memory page 2" (and can't because it's locked);这可能会导致死锁——例如，一个 CPU 锁定“内存页面 1”，然后尝试锁定“内存页面 2”（并且不能因为它被锁定）； while another CPU locks "memory page 2" then tries to lock "memory page 1" (and can't because it's locked).而另一个 CPU 锁定“内存页面 2”然后尝试锁定“内存页面 1”（并且不能因为它被锁定）。 To fix that the theoretical CPU would have to use "global lock ordering" - always assert locks in a specific order.要解决这个问题，理论上的 CPU 必须使用“全局锁排序”——始终以特定顺序断言锁。 The end result would be a significant amount of complexity (where it's likely that the added complexity costs more performance than it saves).最终结果将是大量的复杂性（增加的复杂性可能会花费更多的性能而不是节省的性能）。