简体   繁体   English

erlang OTP Supervisor崩溃

[英]erlang OTP Supervisor crashing

I'm working through the Erlang documentation, trying to understand the basics of setting up an OTP gen_server and supervisor. 我正在研究Erlang文档,试图了解设置OTP gen_server和supervisor的基础知识。 Whenever my gen_server crashes, my supervisor crashes as well. 每当我的gen_server崩溃时,我的主管也会崩溃。 In fact, whenever I have an error on the command line, my supervisor crashes. 事实上,每当我在命令行上出错时,我的主管就会崩溃。

I expect the gen_server to be restarted when it crashes. 我希望gen_server在崩溃时重新启动。 I expect command line errors to have no bearing whatsoever on my server components. 我希望命令行错误对我的服务器组件没有任何影响 My supervisor shouldn't be crashing at all. 我的主管不应该崩溃。

The code I'm working with is a basic "echo server" that replies with whatever you send in, and a supervisor that will restart the echo_server 5 times per minute at most (one_for_one). 我正在使用的代码是一个基本的“回声服务器”,它回复你发送的任何内容,以及一个主管,它最多每分钟重启一次echo_server 5次(one_for_one)。 My code: 我的代码:

echo_server.erl echo_server.erl

-module(echo_server).
-behaviour(gen_server).

-export([start_link/0]).
-export([echo/1, crash/0]).
-export([init/1, handle_call/3, handle_cast/2]).

start_link() ->
    gen_server:start_link({local, echo_server}, echo_server, [], []).

%% public api
echo(Text) ->
    gen_server:call(echo_server, {echo, Text}).
crash() ->
    gen_server:call(echo_server, crash)..

%% behaviours
init(_Args) ->
    {ok, none}.
handle_call(crash, _From, State) ->
    X=1,
    {reply, X=2, State}.
handle_call({echo, Text}, _From, State) ->
    {reply, Text, State}.
handle_cast(_, State) ->
    {noreply, State}.

echo_sup.erl echo_sup.erl

-module(echo_sup).
-behaviour(supervisor).
-export([start_link/0]).
-export([init/1]).

start_link() ->
    supervisor:start_link(echo_sup, []).
init(_Args) ->
    {ok,  {{one_for_one, 5, 60},
       [{echo_server, {echo_server, start_link, []},
             permanent, brutal_kill, worker, [echo_server]}]}}.

Compiled using erlc *.erl , and here's a sample run: 使用erlc *.erl编译,这是一个示例运行:

Erlang R13B01 (erts-5.7.2) [source] [smp:2:2] [rq:2] [async-threads:0] [kernel-p
oll:false]

Eshell V5.7.2  (abort with ^G)
1> echo_sup:start_link().
{ok,<0.37.0>}
2> echo_server:echo("hi").
"hi"
3> echo_server:crash().   

=ERROR REPORT==== 5-May-2010::10:05:54 ===
** Generic server echo_server terminating 
** Last message in was crash
** When Server state == none
** Reason for termination == 
** {'function not exported',
       [{echo_server,terminate,
            [{{badmatch,2},
              [{echo_server,handle_call,3},
               {gen_server,handle_msg,5},
               {proc_lib,init_p_do_apply,3}]},
             none]},
        {gen_server,terminate,6},
        {proc_lib,init_p_do_apply,3}]}

=ERROR REPORT==== 5-May-2010::10:05:54 ===
** Generic server <0.37.0> terminating 
** Last message in was {'EXIT',<0.35.0>,
                           {{{undef,
                                 [{echo_server,terminate,
                                      [{{badmatch,2},
                                        [{echo_server,handle_call,3},
                                         {gen_server,handle_msg,5},
                                         {proc_lib,init_p_do_apply,3}]},
                                       none]},
                                  {gen_server,terminate,6},
                                  {proc_lib,init_p_do_apply,3}]},
                             {gen_server,call,[echo_server,crash]}},
                            [{gen_server,call,2},
                             {erl_eval,do_apply,5},
                             {shell,exprs,6},
                             {shell,eval_exprs,6},
                             {shell,eval_loop,3}]}}
** When Server state == {state,
                            {<0.37.0>,echo_sup},
                            one_for_one,
                            [{child,<0.41.0>,echo_server,
                                 {echo_server,start_link,[]},
                                 permanent,brutal_kill,worker,
                                 [echo_server]}],
                            {dict,0,16,16,8,80,48,
                                {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                 []},
                                {{[],[],[],[],[],[],[],[],[],[],[],[],[],[],
                                  [],[]}}},
                            5,60,
                            [{1273,79154,701110}],
                            echo_sup,[]}
** Reason for termination == 
** {{{undef,[{echo_server,terminate,
                          [{{badmatch,2},
                            [{echo_server,handle_call,3},
                             {gen_server,handle_msg,5},
                             {proc_lib,init_p_do_apply,3}]},
                           none]},
             {gen_server,terminate,6},
             {proc_lib,init_p_do_apply,3}]},
     {gen_server,call,[echo_server,crash]}},
    [{gen_server,call,2},
     {erl_eval,do_apply,5},
     {shell,exprs,6},
     {shell,eval_exprs,6},
     {shell,eval_loop,3}]}
** exception exit: {{undef,
                        [{echo_server,terminate,
                             [{{badmatch,2},
                               [{echo_server,handle_call,3},
                                {gen_server,handle_msg,5},
                                {proc_lib,init_p_do_apply,3}]},
                              none]},
                         {gen_server,terminate,6},
                         {proc_lib,init_p_do_apply,3}]},
                    {gen_server,call,[echo_server,crash]}}
     in function  gen_server:call/2
4> echo_server:echo("hi").
** exception exit: {noproc,{gen_server,call,[echo_server,{echo,"hi"}]}}
     in function  gen_server:call/2
5>

The problem testing supervisors from the shell is that the supervisor process is linked to the shell process. 测试来自shell的主管的问题是主管进程链接到shell进程。 When gen_server process crashes the exit signal is propagated up to the shell which crashes and get restarted. 当gen_server进程崩溃时,退出信号会传播到shell崩溃并重新启动。

To avoid the problem add something like this to the supervisor: 为避免此问题,请向主管添加以下内容:

start_in_shell_for_testing() ->
    {ok, Pid} = supervisor:start_link(echo_sup, []),
    unlink(Pid).

I would suggest you to debug/trace your application to check what's going on. 我建议你调试/跟踪你的应用程序以检查发生了什么。 It's very helpful in understanding how things work in OTP. 这对于了解OTP中的工作原理非常有帮助。

In your case, you might want to do the following. 在您的情况下,您可能希望执行以下操作。

Start the tracer: 启动跟踪器:

dbg:tracer().

Trace all function calls for your supervisor and your gen_server: 跟踪您的主管和gen_server的所有函数调用:

dbg:p(all,c).
dbg:tpl(echo_server, x).
dbg:tpl(echo_sup, x).

Check which messages the processes are passing: 检查进程传递的消息:

dbg:p(new, m).

See what's happening to your processes (crash, etc): 查看您的流程发生了什么(崩溃等):

dbg:p(new, p).

For more information about tracing: 有关跟踪的更多信息:

http://www.erlang.org/doc/man/dbg.html http://www.erlang.org/doc/man/dbg.html

http://aloiroberto.wordpress.com/2009/02/23/tracing-erlang-functions/ http://aloiroberto.wordpress.com/2009/02/23/tracing-erlang-functions/

Hope this can help for this and future situations. 希望这可以为这个和未来的情况提供帮助。

HINT: The gen_server behaviour is expecting the callback terminate/2 to be defined and exported ;) 提示: gen_server行为期望定义和导出回调终止/ 2;)

UPDATE: After the definition of the terminate/2 the reason of the crash is evident from the trace. 更新:终止/ 2的定义之后,从跟踪中可以看出崩溃的原因。 This is how it looks: 这是它的样子:

We (75) call the crash/0 function. 我们(75)调用crash / 0函数。 This is received by the gen_server (78). 这由gen_server(78)接收。

(<0.75.0>) call echo_server:crash()
(<0.75.0>) <0.78.0> ! {'$gen_call',{<0.75.0>,#Ref<0.0.0.358>},crash}
(<0.78.0>) << {'$gen_call',{<0.75.0>,#Ref<0.0.0.358>},crash}
(<0.78.0>) call echo_server:handle_call(crash,{<0.75.0>,#Ref<0.0.0.358>},none)

Uh, problem on the handle call. 呃,句柄调用问题。 We have a badmatch... 我们有一个坏人......

(<0.78.0>) exception_from {echo_server,handle_call,3} {error,{badmatch,2}}

The terminate function is called . 调用terminate函数 The server exits and it gets unregistered. 服务器退出并且未注册。

(<0.78.0>) call echo_server:terminate({{badmatch,2},
 [{echo_server,handle_call,3},
  {gen_server,handle_msg,5},
  {proc_lib,init_p_do_apply,3}]},none)
(<0.78.0>) returned from echo_server:terminate/2 -> ok
(<0.78.0>) exit {{badmatch,2},
 [{echo_server,handle_call,3},
  {gen_server,handle_msg,5},
  {proc_lib,init_p_do_apply,3}]}
(<0.78.0>) unregister echo_server

The Supervisor (77) receive the exit signal from the gen_server and it does its job: Supervisor(77)从gen_server接收退出信号并完成其工作:

(<0.77.0>) << {'EXIT',<0.78.0>,
                      {{badmatch,2},
                       [{echo_server,handle_call,3},
                        {gen_server,handle_msg,5},
                        {proc_lib,init_p_do_apply,3}]}}
(<0.77.0>) getting_unlinked <0.78.0>
(<0.75.0>) << {'DOWN',#Ref<0.0.0.358>,process,<0.78.0>,
                      {{badmatch,2},
                       [{echo_server,handle_call,3},
                        {gen_server,handle_msg,5},
                        {proc_lib,init_p_do_apply,3}]}}
(<0.77.0>) call echo_server:start_link()

Well, it tries... Since it happens what Filippo said... 好吧,它尝试......因为它恰好发生在菲利波所说的......

On the other hand, if at all restart-strategy has to be tested from within console, use console to start the supervisor and check with pman to kill the process. 另一方面,如果必须在控制台内测试restart-strategy,则使用console启动主管并检查pman以终止进程。

You would see that pman refreshes with same supervisor Pid but with different worker Pids depending upon the MaxR and MaxT you have set in restart-strategy. 您会看到pman使用相同的主管Pid刷新,但具有不同的工作者Pids,具体取决于您在restart-strategy中设置的MaxR和MaxT。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM