Erlang rpc:pmap on multiple nodes vs. single node

Question

I'm trying to parallelize my calculations with rpc:pmap . But I'm bit confused with its performance.

Here is simple example:

-module(my_module).
-compile(export_all).

    do_apply( X, F ) -> F( X ).

First of all - test on single node:

1> timer:tc( rpc, pmap, [{my_module, do_apply}, [fun(X) -> timer:sleep(10), X end], lists:seq(1,10000)] ).
{208198,
 [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
  23,24,25,26,27|...]}

After that I've connected second node (second erlang shell process in my OS):

(foo@Stemm.local)24> timer:tc( rpc, pmap, [{my_module, do_apply}, [fun(X) -> timer:sleep(10), X end], lists:seq(1,10000)] ).
{446284,
 [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
  23,24,25,26,27|...]}

Finally I've connected third node:

(foo@Stemm.local)26> timer:tc( rpc, pmap, [{my_module, do_apply}, [fun(X) -> timer:sleep(10), X end], lists:seq(1,10000)] ).
{483399,
 [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
  23,24,25,26,27|...]}

So - I've got worse performance with three nodes vs. single node .

I'm realize that there is some overhead for communication between nodes. But how can I understand in which cases is better to perform calculations on multiple nodes?

Edit:

My step-by-step test from shell:

1> c(my_module).
{ok,my_module}
2>  
2> List = lists:seq(1,10000).
[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
 23,24,25,26,27,28,29|...]

Test performance on single node:

3> timer:tc( rpc, pmap, [{my_module, do_apply}, [fun(X)-> timer:sleep(10), X end], List] ).
{207346,
 [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
  23,24,25,26,27|...]}

Entrance to network environment:

4> net_kernel:start([one]).
{ok,<0.20066.0>}
(one@Stemm.local)5> erlang:set_cookie(node(), foobar).
true

Add second node:

(one@Stemm.local)6> net_kernel:connect('two@Stemm.local').
true
(one@Stemm.local)7> 
(one@Stemm.local)7> nodes().
['two@Stemm.local']

Test performance with two nodes:

(one@Stemm.local)8> timer:tc( rpc, pmap, [{my_module, do_apply}, [fun(X)-> timer:sleep(10), X end], List] ).
{510733,
 [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
  23,24,25,26,27|...]}

Connect third node:

(one@Stemm.local)9> net_kernel:connect('three@Stemm.local').
true
(one@Stemm.local)10> nodes().
['two@Stemm.local',
 'three@Stemm.local']

Test performance with three nodes:

(one@Stemm.local)11> timer:tc( rpc, pmap, [{my_module, do_apply}, [fun(X)-> timer:sleep(10), X end], List] ).
{496278,
 [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
  23,24,25,26,27|...]}

PS I guess that performance decreases because I'm creating each node as a new erlang-shell process in the same physical machine. But I don't know exactly if I'm right.

Answer 1

You don't need to add nodes to get parallelism in Erlang. Each node can support large numbers of processes locally. pmap is already running your function in parallel. This is easier to see if you make the wait longer:

timer:tc( rpc, pmap, [{my_module, do_apply}, [fun(X) -> timer:sleep(1000), X end], lists:seq(1,10000)] ).
{1158174,
 [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,
  23,24,25,26,27|...]}

If the sleeps were running sequentially on one node, then you would expect a minimum wait of 1000 * 10000 = 10,000,000 , and we only had to wait 1,158,174

You are creating 3 separate Erlang VMs, and connecting them to each other. Then, you are running parallel map on one of those VMs. The additional VMs will only hurt your performance with your current setup, since they are all trying to use the same physical resources, and 2 of them aren't even running any of the work.

Multiple nodes will only help performance if they are run on different physical resources.

Erlang rpc:pmap on multiple nodes vs. single node

Question

1 answers

solution1
3 ACCPTED 2012-09-13 23:32:44

Erlang rpc:pmap on multiple nodes vs. single node

Question

1 answers

solution1 3 ACCPTED 2012-09-13 23:32:44

solution1
3 ACCPTED 2012-09-13 23:32:44