简体   繁体   English

通过agreggateId通过流进行分组(Haskell /并发流)

[英]GroupBy of stream by agreggateId (Haskell / concurrency streaming)

Context : I'm implementing an App in CQRS and I'm trying to optimize the processing of commands (1 stream by aggregate Id basically)... 上下文 :我正在CQRS中实现一个App,并且正在尝试优化命令的处理(基本上是通过汇总ID的1个流)...

Problem : I would like to have a first stream that receives all the commands and dispatches these ones by their aggregate Id on different threads : 问题 :我想拥有第一个流来接收所有命令,并通过它们的集合ID在不同线程上分派这些命令:

1) Commands within an aggregate are processed in a serialized way 1)集合中的命令以串行方式处理
2) Aggregates process their commands independently (in parallel). 2)聚合独立(并行)处理其命令。

Solution : I'm trying to perform a groupBy on streams by aggregate Id basically...To help a bit,I simplified the example as follow : 解决方案 :我试图基本上通过聚合ID在流上执行groupBy ...为了帮助一点,我将示例简化如下:

module Sandbox where

import Streamly
import qualified Streamly.Prelude as S
import Control.Concurrent
import Control.Monad.IO.Class (MonadIO(..))

main :: IO ()
main = do
         runStream $ parallely $ S.fromList getAggregateIds |& S.mapM (\x -> do
            threadId <- myThreadId
            liftIO $ putStrLn $ (show threadId) ++ "  value " ++ (show x))


getAggregateIds :: [Integer]
getAggregateIds = [1..3] <> [1..3]

so this script is displaying the following result : 因此此脚本显示以下结果:

ThreadId 17  value 1
ThreadId 15  value 2
ThreadId 19  value 3
ThreadId 13  value 1
ThreadId 16  value 3
ThreadId 18  value 2

What I'm expecting is something like that (no special order just x always processed on the same thread x1 ) : 我期望的是这样的事情(没有特殊的命令x总是在同一线程x1上进行处理):

ThreadId X1  value X
ThreadId Y1  value Y
ThreadId Z1  value Z
ThreadId X1  value X
ThreadId Y1  value Y
ThreadId Z1  value Z

Thanks !! 谢谢 !!

In the code above, parallely decided to create one Haskell thread for each element in the list getAggregateIds , which is [1,2,3,1,2,3] . 在上面的代码中, parallely决定为getAggregateIds列表中的每个元素创建一个Haskell线程,该线程为[1,2,3,1,2,3] parallely does not care about there being some duplicate elements in the list: it simply starts a thread for each one. parallely并不关心列表中是否有重复的元素:它只是为每个元素启动一个线程。

In principle, parallely could allocate only a small number of Haskell threads and reuse them later on (possibly for the same duplicate ID, or another one), but there would be no performance gain in doing so. 原则上, parallely只能分配少量的Haskell线程,以后再使用它们(可能是相同的ID或另一个ID),但这样做不会提高性能。 Indeed, the crucial part here is that an Haskell thread is being allocated, not an OS thread, 实际上,这里的关键部分是分配了Haskell线程,而不是OS线程,

Haskell threads are very lightweight, they use very little memory and so they are very cheap to create and dispose. Haskell线程非常轻巧,它们使用的内存很少,因此创建和处理它们的成本非常低廉。 Trying to reuse them would possibly lead to worse performance. 尝试重用它们可能会导致性能下降。

Further, the Haskell runtime can execute many Haskell threads in a single OS threads. 此外,Haskell运行时可以在单个OS线程中执行许多Haskell线程。 Usually, a small pool of OS threads is kept around by the runtime, and Haskell threads are mapped to those. 通常,运行时会保留一小部分OS线程,并且Haskell线程会映射到这些线程。 Since OS threads are not as lightweight OS threads are indeed reused between Haskell threads. 由于OS线程不如轻量级OS线程确实在Haskell线程之间重用。

Finally, note that the ThreadId is the name of the Haskell thread, not the OS one, so it's normal to see no reuse of those IDs. 最后,请注意, ThreadId是Haskell线程的名称,而不是OS的名称,因此正常情况下不会重复使用这些ID。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM