简体   繁体   English

JavaScript 中的数据竞争?

[英]Data Races in JavaScript?

Lets assume I run this piece of code.让我们假设我运行这段代码。

var score = 0;
for (var i = 0; i < arbitrary_length; i++) {
     async_task(i, function() { score++; }); // increment callback function
}

In theory I understand that this presents a data race and two threads trying to increment at the same time may result in a single increment, however, nodejs(and javascript) are known to be single threaded.从理论上讲,我知道这会导致数据竞争,并且两个线程试图同时增加可能会导致单个增加,但是,nodejs(和 javascript)已知是单线程的。 Am I guaranteed that the final value of score will be equal to arbitrary_length?我能保证分数的最终值将等于任意长度吗?

Am I guaranteed that the final value of score will be equal to arbitrary_length?我能保证分数的最终值将等于任意长度吗?

Yes, as long as all async_task() calls call the callback once and only once, you are guaranteed that the final value of score will be equal to arbitrary_length.是的,只要所有async_task()调用只调用一次回调,就可以保证 score 的最终值等于任意长度。

It is the single-threaded nature of Javascript that guarantees that there are never two pieces of Javascript running at the exact same time. Javascript 的单线程特性保证了永远不会有两个 Javascript 片段同时运行。 Instead, because of the event driven nature of Javascript in both browsers and node.js, one piece of JS runs to completion, then the next event is pulled from the event queue and that triggers a callback which will also run to completion.相反,由于浏览器和 node.js 中 Javascript 的事件驱动性质,一段 JS 运行完成,然后从事件队列中提取下一个事件并触发回调,该回调也将运行完成。

There is no such thing as interrupt driven Javascript (where some callback might interrupt some other piece of Javascript that is currently running).没有中断驱动的 Javascript 之类的东西(其中一些回调可能会中断当前正在运行的其他一些 Javascript)。 Everything is serialized through the event queue.一切都通过事件队列进行序列化。 This is an enormous simplification and prevents a lot of stickly situations that would otherwise be a lot of work to program safely when you have either multiple threads running concurrently or interrupt driven code.这是一个巨大的简化,并防止了许多棘手的情况,否则当您有多个线程并发运行或中断驱动代码时,安全编程需要大量工作。

There still are some concurrency issues to be concerned about, but they have more to do with shared state that multiple asynchronous callbacks can all access.仍然存在一些需要关注的并发问题,但它们更多地与多个异步回调都可以访问的共享状态有关。 While only one will ever be accessing it at any given time, it is still possible that a piece of code that contains several asynchronous operations could leave some state in an "in between" state while it was in the middle of several async operations at a point where some other async operation could run and could attempt to access that data.虽然在任何给定时间只有一个人会访问它,但包含多个异步操作的一段代码仍然有可能使某个状态处于“中间”状态,而它同时处于多个异步操作的中间一些其他异步操作可以运行并且可以尝试访问该数据的点。

You can read more about the event driven nature of Javascript here: How does JavaScript handle AJAX responses in the background?您可以在此处阅读有关 Javascript 事件驱动性质的更多信息: JavaScript 如何在后台处理 AJAX 响应? and that answer also contains a number of other references.该答案还包含许多其他参考资料。

And another similar answer that discusses the kind of shared data race conditions that are possible: Can this code cause a race condition in socket io?另一个类似的答案讨论了可能的共享数据竞争条件的类型: 此代码是否会导致套接字 io 中的竞争条件?

Some other references:其他一些参考:

how do I prevent event handlers to handle multiple events at once in javascript? 如何防止事件处理程序在 javascript 中一次处理多个事件?

Do I need to be concerned with race conditions with asynchronous Javascript? 我是否需要关注异步 Javascript 的竞争条件?

JavaScript - When exactly does the call stack become "empty"? JavaScript - 调用堆栈究竟何时变为“空”?

Node.js server with multiple concurrent requests, how does it work? 具有多个并发请求的 Node.js 服务器,它是如何工作的?


To give you an idea of the concurrency issues that can happen in Javascript (even without threads and without interrupts, here's an example from my own code.为了让您了解 Javascript 中可能发生的并发问题(即使没有线程和中断,这是我自己的代码中的一个示例。

I have a Raspberry Pi node.js server that controls the attic fans in my house.我有一个 Raspberry Pi node.js 服务器,用于控制我家中的阁楼风扇。 Every 10 seconds it checks two temperature probes, one inside the attic and one outside the house and decides how it should control the fans (via relays).它每 10 秒检查两个温度探测器,一个在阁楼内,一个在屋外,并决定如何控制风扇(通过继电器)。 It also records temperature data that can be presented in charts.它还记录可以在图表中显示的温度数据。 Once an hour, it saves the latest temperature data that was collected in memory to some files for persistence in case of power outage or server crash.每小时一次,它将内存中收集的最新温度数据保存到一些文件中,以便在断电或服务器崩溃时持久保存。 That saving operation involves a series of async file writes.该保存操作涉及一系列异步文件写入。 Each one of those async writes yields control back to the system and then continues when the async callback is called signaling completion.这些异步写入中的每一个都将控制权交还给系统,然后在调用异步回调信号完成时继续。 Because this is a low memory system and the data can potentially occupy a significant portion of the available RAM, the data is not copied in memory before writing (that's simply not practical).因为这是一个低内存系统,并且数据可能会占用可用 RAM 的很大一部分,所以在写入之前不会将数据复制到内存中(这根本不实用)。 So, I'm writing the live in-memory data to disk.所以,我正在将实时内存数据写入磁盘。

At any time during any of these async file I/O operations, while waiting for a callback to signify completion of the many file writes involved, one of my timers in the server could fire, I'd collect a new set of temperature data and that would attempt to modify the in-memory data set that I'm in the middle of writing.在任何这些异步文件 I/O 操作期间的任何时候,在等待回调以表示所涉及的许多文件写入完成时,服务器中的一个计时器可能会触发,我会收集一组新的温度数据和这将尝试修改我正在编写的内存数据集。 That's a concurrency issue waiting to happen.这是一个等待发生的并发问题。 If it changes the data while I've written part of it and am waiting for that write to finish before writing the rest, then the data that gets written can easily end up corrupted because I will have written out one part of the data, the data will have gotten modified from underneath me and then I will attempt to write out more data without realizing it's been changed.如果它在我写了一部分数据时更改了数据,并且在写其余部分之前等待该写完成,那么写入的数据很容易最终损坏,因为我将写出一部分数据,数据将从我的下方被修改,然后我将尝试写出更多数据而没有意识到它已被更改。 That's a concurrency issue.那是并发问题。

I actually have a console.log() statement that explicitly logs when this concurrency issue occurs on my server (and is handled safely by my code).我实际上有一个console.log()语句,它在我的服务器上发生此并发问题时明确记录(并且由我的代码安全处理)。 It happens once every few days on my server.它在我的服务器上每隔几天发生一次。 I know it's there and it's real.我知道它就在那里,而且是真的。

There are many ways to work around those types of concurrency issues.有很多方法可以解决这些类型的并发问题。 The simplest would have been to just make a copy in memory of all the data and then write out the copy.最简单的方法是在内存中复制所有数据,然后写出副本。 Because there are not threads or interrupts, making a copy in memory would be safe from concurrency (there would be no yielding to async operations in the middle of the copy to create a concurrency issue).因为没有线程或中断,所以在内存中进行复制可以避免并发(在复制中间不会屈服于异步操作以创建并发问题)。 But, that wasn't practical in this case.但是,在这种情况下这是不切实际的。 So, I implemented a queue.所以,我实现了一个队列。 Whenever I start writing, I set a flag on the object that manages the data.每当我开始写作时,我都会在管理数据的对象上设置一个标志。 Then, anytime the system wants to add or modify data in the stored data while that flag is set, those changes just go into a queue.然后,只要系统想要在设置该标志时添加或修改存储数据中的数据,这些更改就会进入队列。 The actual data is not touched while that flag is set.设置该标志时不会触及实际数据。 When the data has been safely written to disk, the flag is reset and the queued items are processed.当数据已安全写入磁盘时,该标志将被重置并处理排队的项目。 Any concurrency issue was safely avoided.安全地避免了任何并发问题。


So, this is an example of concurrency issues that you do have to be concerned about.因此,这是您必须关注的并发问题的一个示例。 One great simplifying assumption with Javascript is that a piece of Javascript will run to completion without any thread of getting interrupted as long as it doesn't purposely return control back to the system. Javascript 的一个很好的简化假设是,一段 Javascript 将运行到完成,而不会有任何线程被中断,只要它不故意将控制权返回给系统。 That makes handling concurrency issues like described above lots, lots easier because your code will never be interrupted except when you consciously yield control back to the system.这使得处理上述并发问题变得更加容易,因为除非您有意识地将控制权交还给系统,否则您的代码永远不会被中断。 This is why we don't need mutexes and semaphores and other things like that in our own Javascript.这就是为什么我们在我们自己的 Javascript 中不需要互斥体和信号量以及其他类似的东西。 We can use simple flags (just a regular Javascript variable) like I described above if needed.如果需要,我们可以像上面描述的那样使用简单的标志(只是一个常规的 Javascript 变量)。


In any entirely synchronous piece of Javascript, you will never be interrupted by other Javascript.在任何完全同步的 Javascript 中,您永远不会被其他 Javascript 打断。 A synchronous piece of Javascript will run to completion before the next event in the event queue is processed.在处理事件队列中的下一个事件之前,一段同步的 Javascript 将运行完成。 This is what is meant by Javascript being an "event-driven" language.这就是 Javascript 作为“事件驱动”语言的含义。 As an example of this, if you had this code:举个例子,如果你有这个代码:

 console.log("A");
 // schedule timer for 500 ms from now
 setTimeout(function() {
     console.log("B");
 }, 500);

 console.log("C");

 // spin for 1000ms
 var start = Date.now();
 while(Data.now() - start < 1000) {}

 console.log("D");

You would get the following in the console:您将在控制台中获得以下信息:

A
C
D
B

The timer event cannot be processed until the current piece of Javascript runs to completion, even though it was likely added to the event queue sooner than that.在当前的 Javascript 部分运行完成之前无法处理计时器事件,即使它很可能比这更早添加到事件队列中。 The way the JS interpreter works is that it runs the current JS until it returns control back to the system and then (and only then), it fetches the next event from the event queue and calls the callback associated with that event. JS 解释器的工作方式是它运行当前的 JS,直到将控制权返回给系统,然后(并且仅在那时),它从事件队列中获取下一个事件并调用与该事件关联的回调。

Here's the sequence of events under the covers.这是幕后事件的顺序。

  1. This JS starts running.这个JS开始运行。
  2. console.log("A") is output. console.log("A")是输出。
  3. A timer event is schedule for 500ms from now.从现在起 500 毫秒内安排一个计时器事件。 The timer subsystem uses native code.计时器子系统使用本机代码。
  4. console.log("C") is output. console.log("C")是输出。
  5. The code enters the spin loop.代码进入自旋循环。
  6. At some point in time part-way through the spin loop the previously set timer is ready to fire.在旋转循环中途的某个时间点,先前设置的计时器已准备好启动。 It is up to the interpreter implementation to decide exactly how this works, but the end result is that a timer event is inserted into the Javascript event queue.由解释器实现来决定它究竟是如何工作的,但最终结果是将计时器事件插入到 Javascript 事件队列中。
  7. The spin loop finishes.自旋循环结束。
  8. console.log("D") is output. console.log("D")是输出。
  9. This piece of Javascript finishes and returns control back to the system.这段 Javascript 完成并将控制权返回给系统。
  10. The Javascript interpreter sees that the current piece of Javascript is done so it checks the event queue to see if there are any pending events waiting to run. Javascript 解释器看到当前的 Javascript 部分已完成,因此它检查事件队列以查看是否有任何等待运行的未决事件。 It finds the timer event and a callback associated with that event and calls that callback (starting a new block of JS execution).它找到计时器事件和与该事件关联的回调并调用该回调(启动一个新的 JS 执行块)。 That code starts running and console.log("B") is output.该代码开始运行并输出console.log("B")
  11. That setTimeout() callback finishes execution and the interpreter again checks the event queue to see if there are any other events that are ready to run.setTimeout()回调完成执行并且解释器再次检查事件队列以查看是否有任何其他准备好运行的事件。

Node uses an event loop. Node 使用事件循环。 You can think of this as a queue.您可以将其视为队列。 So we can assume, that your for loop puts the function() { score++; }所以我们可以假设,你的 for 循环放置了function() { score++; } function() { score++; } callback arbitrary_length times on this queue. function() { score++; }回调arbitrary_length在此排队时间。 After that the js engine runs these one by one and increase score each time.之后js引擎将这些一一运行,每次都增加score So yes.所以是的。 The only exception if a callback is not called or the score variable is accessed from somewhere else.如果没有调用回调或从其他地方访问score变量,这是唯一的例外。

Actually you can use this pattern to do tasks parallel, collect the results and call a single callback when every task is done.实际上,您可以使用此模式并行执行任务,收集结果并在每个任务完成时调用单个回调。

var results = [];
for (var i = 0; i < arbitrary_length; i++) {
     async_task(i, function(result) {
          results.push(result);
          if (results.length == arbitrary_length)
               tasksDone(results);
     });
}

No two invocations of the function can happen at the same time (b/c node is single threaded) so that will not be a problem.函数的两个调用不能同时发生(b/c 节点是单线程的),所以不会有问题。 The only problem would be ifin some cases async_task(..) drops the callback.唯一的问题是如果在某些情况下 async_task(..) 丢弃回调。 But if, eg, 'async_task(..)' was just calling setTimeout(..) with the given function, then yes, each call will execute, they will never collide with each other, and 'score' will have the value expected, 'arbitrary_length', at the end.但是,如果,例如,'async_task(..)' 只是用给定的函数调用 setTimeout(..),那么是的,每个调用都会执行,它们永远不会相互冲突,并且 'score' 将具有预期的值, 'arbitrary_length', 最后。

Of course, the 'arbitrary_length' can't be so great as to exhaust memory, or overflow whatever collection is holding these callbacks.当然,'arbitrary_length' 不能大到耗尽内存或溢出任何持有这些回调的集合。 There is no threading issue however.但是没有线程问题。

I do think it's worth noting for others that view this, you have a common mistake in your code.我确实认为对于其他人来说值得注意的是,您的代码中有一个常见的错误。 For the variable i you either need to use let or reassign to another variable before passing it into the async_task().对于变量 i,在将其传递到 async_task() 之前,您需要使用 let 或重新分配给另一个变量。 The current implementation will result in each function getting the last value of i.当前的实现将导致每个函数获得 i 的最后一个值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM