简体   繁体   English

用于Map-Reduce的Celery或Python中的其他替代品?

[英]Celery for Map-Reduce, or other alternatives in Python?

I have expensive jobs that are very suited to be run under map-and-reduce model (long story short, it is to aggregate a few hundred rankings that are previously calculated via some time-consuming algorithm). 我有非常适合在map-and-reduce模型下运行的昂贵工作(长话短说,它是聚合几百个先前通过一些耗时算法计算的排名)。

I wanted to parallelize the jobs on clusters (not merely multiprocessing), and focused on 2 implementations: Celery and Disco . 我想在群集上并行化作业(不仅仅是多处理),而是专注于2种实现: CeleryDisco Celery does not support naive map-and-reduce out of the box, and although the "map" part is easily done using TaskSets, how do you implement the "reduce" part efficiently? Celery不支持开箱即用的天真地图和减少,虽然使用TaskSet可以轻松完成“地图”部分,但如何有效地实现“减少”部分?

(My problem with disco is that it does not run on Windows, and I have already setup celery for another part of the program, so running another framework for map-reduce seems to be rather inelegant.) (我的迪斯科问题是它不能在Windows上运行,而且我已经为该程序的另一部分设置了芹菜,因此运行另一个map-reduce框架似乎相当不优雅。)

Basically you need to take the output of one tasks and apply the output as input to another task. 基本上,您需要获取一个任务的输出并将输出作为输入应用于另一个任务。 celery is not handy in this. 芹菜在这方面并不方便。

In celery way, you can have a Periodic Task scheduler that execute the jobs (map part) in the async manner and keep the task reference itself if it is single computer or post the reference to DB backend(redis/mongo/etc). 在celery方式中,您可以使用Periodic Task调度程序以异步方式执行作业(映射部分),如果它是单台计算机或将引用发布到DB后端(redis / mongo / etc),则保留任务引用本身。 You might need schedulers to collect this result and apply on reduce function(s). 您可能需要调度程序来收集此结果并应用于reduce函数。

I would say that you run your own python processes for map and reduce on all the clusters and make sure that you store the result in memory db like redis and use the celery to execute the tasks on map and reduce. 我会说你为map运行你自己的python进程并减少所有集群,并确保你将结果存储在内存db中,就像redis一样,并使用celery在map上执行任务并减少。 Your main process would collect and combine the results. 您的主要流程将收集并合并结果。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM