简体   繁体   中英

After unexpected shut down, RabbitMQ can only start, if persistent data is deleted

I'm using a single RabbitMQ instance (not a cluster), and all the queues declared are durable and all the messages sent are persistent.

I'm sending messages to RabbitMQ continuously, then (to simulate a crash) I kill the rabbitmq process and then start RabbitMQ service again.

The problem I face is that after the second unexpected shutdown, the RabbitMQ service fails to start normally.

Even though rabbitmq-service.bat start returns:

C:\Program Files\erl7.1\erts-7.1\bin\erlsrv: Service RabbitMQ started.

but the service is not running. rabbitmqctl.bat status outputs:

Error: unable to connect to node 'rabbit@HCE-G971WY1': nodedown

Any suggestions, why the service fails to start?

If I delete all persistence data (\\AppData\\Roaming\\RabbitMQ\\db), then RabbitMQ starts normally, but then all my messages and queues are lost.

I'm using:

  • Windows 7
  • RabbitMQ 3.6.0 on Erlang 18.1

Here is the log file:

    =ERROR REPORT==== 18-Feb-2016::14:46:03 ===
    ** Generic server <0.154.0> terminating
    ** Last message in was {'$gen_cast',
                               {submit_async,
                                   #Fun<rabbit_queue_index.32.56515753>}}
    ** When Server state == undefined
    ** Reason for termination == 
    ** {{case_clause,{{true,<<189,10,73,71,182,201,144,167,110,15,200,171,200,160,
                              ...101>>},
                      no_del,no_ack}},
        [{rabbit_queue_index,action_to_entry,3,
                             [{file,"src/rabbit_queue_index.erl"},{line,780}]},
         {rabbit_queue_index,add_to_journal,3,
                             [{file,"src/rabbit_queue_index.erl"},{line,757}]},
         {rabbit_queue_index,add_to_journal,3,
                             [{file,"src/rabbit_queue_index.erl"},{line,748}]},
         {rabbit_queue_index,parse_journal_entries,2,
                             [{file,"src/rabbit_queue_index.erl"},{line,895}]},
         {rabbit_queue_index,recover_journal,1,
                             [{file,"src/rabbit_queue_index.erl"},{line,869}]},
         {rabbit_queue_index,scan_segments,3,
                             [{file,"src/rabbit_queue_index.erl"},{line,692}]},
         {rabbit_queue_index,queue_index_walker_reader,2,
                             [{file,"src/rabbit_queue_index.erl"},{line,680}]},
         {rabbit_queue_index,'-queue_index_walker/1-fun-0-',2,
                             [{file,"src/rabbit_queue_index.erl"},{line,661}]}]}

    =INFO REPORT==== 18-Feb-2016::14:46:03 ===
    Error description:
       {could_not_start,rabbit,
           {{badmatch,
                {error,
                    {{{{case_clause,
                           {{true,
                                <<189,10,73,71,182,201,144,167,110,15,200,171,200,
                                  ...101>>},
                            no_del,no_ack}},
                       [{rabbit_queue_index,action_to_entry,3,
                            [{file,"src/rabbit_queue_index.erl"},{line,780}]},
                        {rabbit_queue_index,add_to_journal,3,
                            [{file,"src/rabbit_queue_index.erl"},{line,757}]},
                        {rabbit_queue_index,add_to_journal,3,
                            [{file,"src/rabbit_queue_index.erl"},{line,748}]},
                        {rabbit_queue_index,parse_journal_entries,2,
                            [{file,"src/rabbit_queue_index.erl"},{line,895}]},
                        {rabbit_queue_index,recover_journal,1,
                            [{file,"src/rabbit_queue_index.erl"},{line,869}]},
                        {rabbit_queue_index,scan_segments,3,
                            [{file,"src/rabbit_queue_index.erl"},{line,692}]},
                        {rabbit_queue_index,queue_index_walker_reader,2,
                            [{file,"src/rabbit_queue_index.erl"},{line,680}]},
                        {rabbit_queue_index,'-queue_index_walker/1-fun-0-',2,
                            [{file,"src/rabbit_queue_index.erl"},{line,661}]}]},
                      {gen_server2,call,[<0.211.0>,out,infinity]}},
                     {child,undefined,msg_store_persistent,
                         {rabbit_msg_store,start_link,
                             [msg_store_persistent,
                              "c:/Users/212303924/AppData/Roaming/RabbitMQ/db/rabbit@HCE-G971WY1-mnesia",
                              [],
                              {#Fun<rabbit_queue_index.2.56515753>,
                               {start,
                                   [{resource,<<"/">>,queue,
                                        <<"execution-processed-request">>},
                                    {resource,<<"/">>,queue,
                                        <<"execution-result">>},
                                    {resource,<<"/">>,queue,
                                        <<"job-result-queue-mirror">>}]}}]},
                         transient,4294967295,worker,
                         [rabbit_msg_store]}}}},
            [{rabbit_variable_queue,start_msg_store,2,
                 [{file,"src/rabbit_variable_queue.erl"},{line,458}]},
             {rabbit_variable_queue,start,1,
                 [{file,"src/rabbit_variable_queue.erl"},{line,440}]},
             {rabbit_priority_queue,start,1,
                 [{file,"src/rabbit_priority_queue.erl"},{line,92}]},
             {rabbit_amqqueue,recover,0,
                 [{file,"src/rabbit_amqqueue.erl"},{line,234}]},
             {rabbit,recover,0,[{file,"src/rabbit.erl"},{line,538}]},
             {rabbit_boot_steps,'-run_step/2-lc$^1/1-1-',1,
                 [{file,"src/rabbit_boot_steps.erl"},{line,49}]},
             {rabbit_boot_steps,run_step,2,
                 [{file,"src/rabbit_boot_steps.erl"},{line,49}]},
             {rabbit_boot_steps,'-run_boot_steps/1-lc$^0/1-0-',1,
                 [{file,"src/rabbit_boot_steps.erl"},{line,26}]}]}}

SOLVED

A pivotal engineer answered the question here: https://groups.google.com/forum/#!topic/rabbitmq-users/f0akEFlQATU

The problem was that RabbitMQ failed to start after a force kill, because it failed to read queue index.

The solution was to change the RabbitMQ config.

Changing the value of the rabbit.queue_index_max_journal_entries from the default 65536 to 64 solved the issue. This config value controls how quickly queue index will be flushed to disk.

Note this might have an impact on throughput, but in my case safety and being able to start after a force kill was more important.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM