简体   繁体   English

期货如何提高引擎盖下的并发性?

[英]How do futures work to improve concurrency under the hood?

I am trying to wrap my head around how futures work under the hood. 我试图围绕期货的幕后工作进行思考。 I am familiar with the concept in both Java and Scala. 我对Java和Scala中的概念都很熟悉。 I've been using futures in PlayFramework to prevent blocking operations from occupying my connection threads. 我一直在PlayFramework中使用Future来防止阻塞操作占用我的连接线程。 That has worked well for the most part, but I am still feeling that there are some parts under the hood that I am missing. 在大多数情况下,这种方法效果很好,但是我仍然觉得我缺少某些方面。 Particularly, when it comes to keeping the number of threads used for executing blocking operations low. 特别是,在保持用于执行阻塞操作的线程数较低的情况下。

My generation assumption is (correct me if I am wrong) that there should be a single thread (in the simplest case), running an endless loop over a collection of pending futures. 我的世代假设是(如果我错了,请纠正我)应该有一个线程(在最简单的情况下),在一组未决的期货上运行一个无限循环。 On every turn, the thread sleeps a little, then picks the futures one by one, checks if some results have arrived, and if so, returns them and removes the finished futures form the collection. 在每个回合中,线程都会睡一会儿,然后一个接一个地挑选期货,检查是否有一些结果到达,然后返回,然后从集合中删除完成的期货。

IMHO, this can only work if the underlying operations are non-blocking too. 恕我直言,这仅在基础操作也未阻塞的情况下才有效。 Otherwise, logic tells me that those should be isolated in their own separate threads, as part of a pool. 否则,逻辑告诉我应该将那些作为池的一部分隔离在各自独立的线程中。 My train of thought crashes at the point where each operation, even within a future is fundamentally blocking. 我的思路崩溃了,即使在将来,每项操作都从根本上阻碍了这一步。 Then, I would assume that in the worst case scenario, we would once again end up with one thread per blocking operation, even when wrapped in futures. 然后,我假设在最坏的情况下,即使阻塞在期货中,我们每次阻塞操作都会再次遇到一个线程。

The problem is that the a large portion of the widely used IO code in Java is fundamentally blocking. 问题在于Java中广泛使用的IO代码的很大一部分在根本上是阻塞的。 This means that executing 15 JDBC operations wrapped in futures, will still spun off 15 threads. 这意味着执行期货中包装的15个JDBC操作仍将拆分出15个线程。 Otherwise, we would have to call them sequentially on a single thread, which is even worse. 否则,我们将不得不在单个线程上顺序调用它们,这甚至更糟。

What I am trying to say is that wrapping fundamentally blocking IO operations in futures, should in theory not help at all. 我要说的是,从根本上说,将从根本上阻止未来的IO操作包装在理论上应该完全没有帮助。 Am I right or wrong? 我是对还是错? Please, help me build the puzzle. 拜托,帮我建立难题。

"What I am trying to say is that wrapping fundamentally blocking IO operations in futures, should in theory not help at all. Am I right or wrong?" “我想说的是,从根本上包装期货中的IO操作,从理论上讲完全没有帮助。我是对还是错?”

It depends on what you mean by "help". 这取决于您所说的“帮助”。 It will not magically make your I/O operation non-blocking that's true (and a lot of people get confused by that). 这不会神奇地使您的I / O操作成为非阻塞的,这是事实(很多人对此感到困惑)。

What it will do, depending on how the thread pool is configured, is make sure your I/O (or whatever blocking operation) is running in a dedicated thread - leaving the "core" threads free to run. 根据线程池的配置方式,它将确保您的I / O(或任何阻塞操作)在专用线程中运行-使“核心”线程可以自由运行。

Let's say your pool has a number of threads equivalent to the cores of your machine, if you want to perform a blocking operation you can hint that to the pool and it will create a specific thread for this operation since it knows that this thread won't use much of the CPU. 假设您的池具有与计算机核心相同的多个线程,如果您要执行阻塞操作,则可以向池提示该线程,并且它将为该操作创建特定的线程,因为它知道该线程将占用大量CPU。

Another common practice, is to use a dedicated ExecutionContext for blocking operations, to isolate their execution from the rest of your program. 另一种常见的做法是使用专用的ExecutionContext阻止操作,以将其执行与程序的其余部分隔离开。

While there are still some helpful aspects to using futures even if the "underlying" I/O is happening via a pool of blocking threads, you're right that in general this wouldn't usually provide some of the big advantages of Future s. 尽管“潜在的” I / O是通过阻塞线程池发生的,使用期货仍然有一些有用的方面,但您说对了,通常这通常不会提供Future的某些重要优势,这是正确的。 The piece you're missing is the possibility of doing I/O that is truly nonblocking, ie ultimately calling select or similar interfaces at the system call level. 您所缺少的是实现真正无阻塞的I / O的可能性,即最终在系统调用级别调用select或类似接口。 The Java NIO interfaces can do this, as can a number of frameworks built on top of that (eg Netty), or callback-oriented libraries like Apache's HttpAsyncClient . Java NIO接口可以做到这一点,在此之上构建的许多框架(例如Netty)或面向回调的库(例如Apache的HttpAsyncClient )也可以做到这HttpAsyncClient

Unfortunately there is no async replacement for full JDBC that I'm aware of, though there is eg postgresql-async, which covers at least some of the use cases. 不幸的是,尽管有例如postgresql-async,但它并没有替代我所知道的完整JDBC,它至少涵盖了一些用例。

"My generation assumption is (correct me if I am wrong) that there should be a single thread (in the simplest case), running an endless loop over a collection of pending futures". “我的世代假设是(如果我错了,请纠正我)应该有一个线程(在最简单的情况下),在未决期货的集合上运行一个无限循环。” This isn't always true. 并非总是如此。 It depends on the ExecutionContext that is being used. 这取决于正在使用的ExecutionContext。 You can have an ExecutionContext that creates a new thread for each Future if you want or you can have a fixed pool of threads that run your Futures. 您可以根据需要为每个Future创建一个新线程的ExecutionContext,也可以具有运行Future的固定线程池。 There are other options as well. 还有其他选择。 It really depends on what you are trying to do. 这实际上取决于您要执行的操作。

IMO, the real power of Futures comes in the programming model that it allows. IMO,期货的真正力量在于它所允许的编程模型。 Once you understand the model, you can compose Futures pretty easily. 一旦了解了模型,就可以轻松组成期货。 Then, if you need to alter the concurrency model a little, you can do that by configuring your ExecutionContexts and using multiple/different ExecutionContexts per group of tasks. 然后,如果您需要稍微更改并发模型,则可以通过配置ExecutionContexts并在每组任务中使用多个/不同的ExecutionContexts来实现。 In a real/large application multiple ExecutionContexts are used - not just the default one. 在实际/大型应用程序中,使用了多个ExecutionContext-不仅是默认的。

You mentioned blocking IO and JDBC, so I'll give an example related to that. 您提到了阻止IO和JDBC,因此我将举一个与此相关的示例。 In the past I've used Futures with JDBC to do concurrent inserts to different tables. 过去,我将Futures与JDBC一起使用,以对不同的表进行并发插入。 If the inserts are unrelated (no FK for example), assuming you have an ExecutionContext configured with at least as many threads as JDBC connections, then the caller only waits for the longest duration insert not the sum of all insert times. 如果插入是不相关的(例如,没有FK),并且假设您配置的ExecutionContext至少配置了与JDBC连接一样多的线程,则调用方将仅等待最长插入时间,而不是所有插入时间的总和。 For example: 例如:

// functions that insert into tables
// return Future primary key (Long)
// use blocking JDBC calls
def insertIntoTable1(col1: String, col2: String) : Long = ...

def persistData(... input data for all tables ...) = {
  // get Future primary keys
  val futurePrimaryKey1 = insertIntoTable1(...)
  val futurePrimaryKey2 = insertIntoTable2(...)
  val futurePrimaryKey3 = insertIntoTable3(...)

  // inserts for id1 ... idN are done concurrently
  val futureParentTablePrimaryKey : Future[Long] = for {
    id1 <- futurePrimaryKey1
    id2 <- futurePrimaryKey2
    id3 <- futurePrimaryKey3
  } yield insertIntoParentTable(id1, id2, id3)

  // wait for final insert
  Await.result(futureParentTablePrimaryKey, Duration.Inf)
  // of course if you wanted the caller to know about Futures, don't do the Await.
}

It's not obvious from the pseudocode, but a non-default ExecutionContext should be used. 从伪代码来看不是很明显,但是应该使用非默认的ExecutionContext。 So, in the particular example blocking IO is all over the place. 因此,在特定示例中,到处都是阻塞IO。 But the key is that it's concurrent waiting and the programming model is simple to understand. 但是关键在于它是并发等待,并且编程模型易于理解。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM