[英]c# parallelism multiple tasks
I want to achieve something like : 我想实现以下目标:
Create a bunch of mongo queries 创建一堆mongo查询
For each query, get the deserialized mongo data set as a ObjectA
对于每个查询,将反序列化的mongo数据集获取为
ObjectA
Map all ObjectA
to a list of ObjectB
(takes some time given that we're working with several millions objects) 地图的所有
ObjectA
以列表ObjectB
(需要给予一定的时间,我们正在与几百万对象时)
When the mapping is done, upserts all ObjectB
(in a loop) to a new mongo db/collection (also takes some time), and start fetching/mapping the next data set with the next query 当映射完成,upserts所有
ObjectB
(中环),以一个新的蒙戈DB / collection(也需要一些时间),并开始读取/映射下一个数据的下一个查询设定
if the next mapping is done before the last mongo upsert is finished, wait for it to complete then start writing that data-set to mongo 如果下一个映射是在最后一个mongo upsert完成之前完成的,请等待它完成,然后开始将该数据集写入mongo
Now I've been toying around with Task
, I think it's the way to go but I have a hard time getting how to do this. 现在,我一直在玩
Task
,我认为这是可行的方法,但是我很难做到这一点。
Should I run 2 arrays of tasks? 我应该运行2组任务吗?
How can I create the "dependency" between a "fetch/map task" and a "write to mongo task"? 如何在“获取/映射任务”和“写入mongo任务”之间创建“依赖关系”?
Any help appreciated, thanks! 任何帮助表示赞赏,谢谢!
Have a look at Dataflow http://msdn.microsoft.com/en-us/library/hh228603(v=vs.110).aspx 看看数据流http://msdn.microsoft.com/en-us/library/hh228603(v=vs.110).aspx
You can create a block for each stage of your process and then chain them together. 您可以为过程的每个阶段创建一个块,然后将它们链接在一起。
then you can post the unique query data to the dataflow in a loop and set the parallelism as needed. 那么您可以将唯一查询数据循环发布到数据流中,并根据需要设置并行性。 eg up the parallelism of stage 1 & 2 as long a you aren't having memory issues, but keep stage 3 non-parallel
例如,只要您没有内存问题,就提高阶段1和阶段2的并行性,但是使阶段3不并行
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.