简体   繁体   English

并行地图上的worker_limit_reached减少工作

[英]worker_limit_reached on parallel map reduce jobs

I have 50 hosts trying to run the map reduce job below on Riak. 我有50位主机在Riak上尝试运行地图缩小作业。 I am getting the error below where some of the hosts complain about the worker_limit being reached. 我在某些主机抱怨到达worker_limit位置下方出现错误。

Looking for some insights on whether I can tune the system to avoid this error? 寻找一些我是否可以调整系统以避免此错误的见解? Couldn't find too much documentation around the worker_limit . worker_limit周围找不到太多文档。

{"phase":0,"error":"[worker_limit_reached]","input":"{<<\\"provisionentry\\">>,<<\\"R89Okhz49SDje0y0qvcnkK7xLH0\\">>}","type":"result","stack":"[]"} with query MapReduce(path='/mapred', reply_headers={'content-length': '144', 'access-control-allow-headers': 'Content-Type', 'server': 'MochiWeb/1.1 WebMachine/1.10.8 (that head fake, tho)', 'connection': 'close', 'date': 'Thu, 27 Aug 2015 00:32:22 GMT', 'access-control-allow-origin': '*', 'access-control-allow-methods': 'POST, GET, OPTIONS', 'content-type': 'application/json'}, verb='POST', headers={'Content-Type': 'application/json'}, data=MapReduceJob(inputs=MapReduceInputs(bucket='provisionentry', key=u'34245e92-ccb5-42e2-a1d9-74ab1c6af8bf', index='testid_bin'), query=[MapReduceQuery(map=MapReduceQuerySpec(language='erlang', module='datatools', function='map_object_key_value'))])) { “相”:0, “错误”: “[worker_limit_reached]”, “输入”: “{<< \\” provisionentry \\ “>>,<< \\” R89Okhz49SDje0y0qvcnkK7xLH0 \\ “>>}”, “类型”:”结果“,”堆栈“:” []“}和查询MapReduce(path ='/ mapred',reply_headers = {'content-length':'144','access-control-allow-headers':'Content-Type ','服务器':'MochiWeb / 1.1 WebMachine / 1.10.8(那是假的,tho)','连接':'关闭','日期':'星期四,2015年8月27日00:32:22 GMT', 'access-control-allow-origin':'*','access-control-allow-origins':'POST,GET,OPTIONS','content-type':'application / json'},动词='POST' ,标头= {'Content-Type':'application / json'},data = MapReduceJob(inputs = MapReduceInputs(bucket ='provisionentry',key = u'34245e92-ccb5-42e2-a1d9-74ab1c6af8bf',index ='testid_bin '),query = [MapReduceQuery(map = MapReduceQuerySpec(language ='erlang',module ='datatools',function ='map_object_key_value'))]))))))

Map reduce in Riak does not scale well, and so does not work well as part of a user-facing service. Riak中的Map reduce无法很好地缩放,因此不能作为面向用户服务的一部分很好地工作。

It is suitable for periodic administrative tasks, or pre-calculations when the number of jobs can be limited. 它适用于定期的管理任务,或在可以限制作业数量的情况下进行预先计算。

Since the map phase of the job is a coverage query, you will need to involve at least 1/n_val (rounded up) vnodes in each map, using 1 worker at each. 由于作业的地图阶段是覆盖率查询,因此您将需要在每个地图中至少包含1 / n_val个(向上舍入)的vnode,每个节点要使用1个worker。 Since you cannot guarantee that the selected coverage sets do not overlap, you should not expect to be able to simultaneously run more map reduce jobs than your worker limit setting. 由于您不能保证所选的coverage集不会重叠,因此不应期望能够同时运行比您的工作人员限制设置更多的map reduce作业。

The default worker limit is 50 ( https://github.com/basho/riak_pipe/blob/develop/src/riak_pipe_vnode.erl#L86 ), but you can adjust that by setting {worker_limit, 50} in the riak_pipe section of app.config or advanced.config. 默认工作程序限制为50( https://github.com/basho/riak_pipe/blob/develop/src/riak_pipe_vnode.erl#L86 ),但是您可以通过在应用程序的riak_pipe部分中设置{worker_limit, 50}来进行调整.config或advanced.config。

Keep in mind that each worker is a process, so you may need to increase the process limit for the erlang VM as well. 请记住,每个工作进程都是一个进程,因此您可能还需要增加erlang VM的进程限制。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM