简体繁体 English

Java（流）多生产者消费者（多线程/处理）

[英]Producer Consumer with Java (Streams) Multi [Threading/Processing]

原文 2018-05-01 18:51:20 4 2 java/ multithreading/ lambda/ aws-lambda/ java-stream

I am working on a project with is essentially a long chain of producer/consumer. 我正在从事的项目实质上是生产者/消费者的长链。 This means that 1st process takes data from user (huge csv file), processes data line by line, passes on to another process/thread which consumes this data, processes and then passes on to another one and so on. 这意味着第一个进程从用户（巨大的csv文件）中获取数据，逐行处理数据，传递到另一个进程/线程，该进程/线程使用此数据，然后再传递给另一个进程，依此类推。
The chain is around 8-10 units long, each unit acting as a consumer and then a producer. 链条长度约为8-10个单位，每个单位分别充当消费者和生产者。
I have thought of using AWS lambdas for doing this. 我已经考虑过使用AWS lambda来做到这一点。 I could also use Java streams. 我也可以使用Java流。 The advantage I see in AWS lambda is that you could put individual throttling limit on each node. 我在AWS lambda中看到的优点是您可以在每个节点上设置单独的限制。
So, if your node's job is to update a dynamodb record, we could throttle this node to match with the write units of dynamodb and so on. 因此，如果您节点的工作是更新dynamodb记录，我们可以限制该节点以使其与dynamodb的写入单位匹配，依此类推。
Another advantage I see with using lambda is that I don't have to write code to manage multi-processing (or multi-threading) and my processing of data won't be dependent on my chosen hardware - I could also save cost by choosing a low grade hardware whose only job would be to act as the 1st producer, but I'd still be paying for aws lambda. 使用lambda可以看到的另一个优势是，我不必编写代码来管理多处理（或多线程），并且我对数据的处理不会依赖于我选择的硬件-我还可以通过选择来节省成本一种低档硬件，唯一的工作就是充当第一生产商，但我仍然会为aws lambda付出代价。

Is working with Java streams similar if I use Java lambdas and use them in the same way how I would use AWS lambda? 如果我使用Java lambda，并且以与使用AWS lambda相同的方式使用它们，则使用Java流是否相似？ Can I use throttling in Java lambdas? 我可以在Java Lambda中使用限制吗？
If I use Java streams, is there an easy way to manage multi-processing (threading). 如果我使用Java流，是否有一种简单的方法来管理多处理（线程）。
Apart from throttling and managing pools, are there any other advantages of using lambda? 除了限制和管理池外，使用lambda还有其他优点吗？ Are there any disadvantages? 有什么缺点吗？
Are there any other alternatives apart from the above two? 除了上述两种以外，还有其他选择吗？
What if I want multiple consumers for certain nodes in the chain? 如果我希望链中某些节点有多个消费者，该怎么办？ eg Consumer consumes data, processes and passes it on to the next one in the chain, but we also have to log the data or store it in db. 例如，消费者消费数据，处理数据并将其传递到链中的下一个数据，但是我们还必须记录数据或将其存储在db中。

2 个解决方案

Looks like reactive streams (and not java streams or AWS lambdas) are the best suitable tool for your task. 看起来像reactive streams （而不是Java流或AWS lambda）是最适合您的任务的工具。 They provide: 他们提供：

backpressure, that is, balancing the speed of consumers and producers 反压，即平衡消费者和生产者的速度
parallel execution of all the steps of pipeline chain 并行执行流水线链的所有步骤
connecting multiple consumers for the same producer 连接同一生产者的多个消费者

There is a number of reactive streams implementations: JavaRx2, Project Reactor (included in Spring 5), Akka Streams and others. 有许多reactive streams实现：JavaRx2，Project Reactor（包含在Spring 5中），Akka Streams等。

听起来您应该使用Step函数将lambda链接在一起。