[英]Producer Consumer with Java (Streams) Multi [Threading/Processing]
I am working on a project with is essentially a long chain of producer/consumer. 我正在从事的项目实质上是生产者/消费者的长链。 This means that 1st process takes data from user (huge csv file), processes data line by line, passes on to another process/thread which consumes this data, processes and then passes on to another one and so on. 这意味着第一个进程从用户(巨大的csv文件)中获取数据,逐行处理数据,传递到另一个进程/线程,该进程/线程使用此数据,然后再传递给另一个进程,依此类推。
The chain is around 8-10 units long, each unit acting as a consumer and then a producer. 链条长度约为8-10个单位,每个单位分别充当消费者和生产者。
I have thought of using AWS lambdas for doing this. 我已经考虑过使用AWS lambda来做到这一点。 I could also use Java streams. 我也可以使用Java流。 The advantage I see in AWS lambda is that you could put individual throttling limit on each node. 我在AWS lambda中看到的优点是您可以在每个节点上设置单独的限制。
So, if your node's job is to update a dynamodb record, we could throttle this node to match with the write units of dynamodb and so on. 因此,如果您节点的工作是更新dynamodb记录,我们可以限制该节点以使其与dynamodb的写入单位匹配,依此类推。
Another advantage I see with using lambda is that I don't have to write code to manage multi-processing (or multi-threading) and my processing of data won't be dependent on my chosen hardware - I could also save cost by choosing a low grade hardware whose only job would be to act as the 1st producer, but I'd still be paying for aws lambda. 使用lambda可以看到的另一个优势是,我不必编写代码来管理多处理(或多线程),并且我对数据的处理不会依赖于我选择的硬件-我还可以通过选择来节省成本一种低档硬件,唯一的工作就是充当第一生产商,但我仍然会为aws lambda付出代价。
Looks like reactive streams
(and not java streams or AWS lambdas) are the best suitable tool for your task. 看起来像reactive streams
(而不是Java流或AWS lambda)是最适合您的任务的工具。 They provide: 他们提供:
There is a number of reactive streams
implementations: JavaRx2, Project Reactor (included in Spring 5), Akka Streams and others. 有许多reactive streams
实现:JavaRx2,Project Reactor(包含在Spring 5中),Akka Streams等。
听起来您应该使用Step函数将lambda链接在一起。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.