简体   繁体   中英

parallel processing of a queue

I'm trying to find a way to process a queue in several threads, dynamically adjusting the number of consumers. Basically the task is very well known: multiple producers create messages and submit them into a queue, multiple consumers process messages from the queue. Now, I thought about different ways of doing it utilizing diffrent components like System.Collections.Queue.Synchronized, System.Collections.Concurrent.ConcurrentQueue and System.Collections.Concurrent.BlockingCollection but I just can't decide how to do it properly with maximum efficiency so I will be glad to receive some bright ideas through your input.
Here are more details:

  • The message rate is expected to be realy intensive in some occasions, but the handling is going to be relatively straight-forward;
  • I have no idea how many consumers should I have;
  • I want the process to adjust the number of current consumers, instead of having them blocked, depending on the amount of messages enqueued(meaning that I want to populate additional consumer fe for each hundred of messages, and on of the consumers should halt if number of enqueued messages is 50 less than the number that was needed to populate it, fe third consumer will be populated when the amount of messages has grown over 300, and it should halt when it drops to 250).

This is the idea. Now, I thought about wraping the ConcurrentQueue into a class that will encapsulate the Enqueue method and will check the number of messages after the enqueuing and will make the decision about starting an additional consumer. And the consumer should have within the loop a check that should make a decision about halting it. I think that you will suggest some more interesting solutions.

By the way, one of the situations I still don't know how to handle is theoretically when a last message is being enqueued and in the same time the last consumer has halted. Another situation is also about halting - several consumers will be halted if they will get to the halt check in the same time. How should I deal with these situations?

To demonstrate what do I mean, consider this sample:

class MessageController
{
    private BlockingCollection<IMessage> messageQueue = new BlockingCollection<IMessage>();

    int amountOfConsumers;

    public void Enqueue(IMessage message)
    {
        messageQueue.Add(message); // point two

        if (Math.Floor((double)messageQueue.Count / 100)+1 > amountOfConsumers) // point three
        {
            Task.Factory.StartNew(() =>
            {
                IMessage msg;
                while ((messageQueue.Count > 0) && (Math.Floor((double)((messageQueue.Count + 50) / 100)) + 1 >= amountOfConsumers)) //point one
                {
                    msg = messageQueue.Take();
                    //process msg...
                }

                ConsumerQuit(); // point four
            });

            Interlocked.Increment(ref amountOfConsumers);
        }
    }

    public void ConsumerQuit()
    {
        Interlocked.Decrement(ref amountOfConsumers);
    }
}

So now when I can point to the specific code lines these are the questions:

  • When the last consumer found that there're no messages enqueued(@point one) and before it calls the ConsumerQuit method, the last message arrives and enqueued, then the check for additional consumers is done, and it turns out(@point three) that there is still a consumer working, and because of that one consumer for single message is more than enough - nothing happens, then the ConsumerQuit is finally called, and I have a message stuck in queue.
 ConsumerTask | LastMessageThread ------------------------------------------------------ @point one(messageQueue.Count=0) | @point two no time | @point three(amountOfConsumers=1) @point four | ended; ended; | ended; 
  • Several consumers got to the "point one" check simultaneously when one of them should be halted(fe messageQueue.Count is 249), several of them will halt because before the ConsumerQuit will be called on one of them several others will do this check also.
 ConsumerTask1 | ConsumerTask2| ConsumerTask3 | ConsumerTask4| ------------------------------------------------------------------------------ @point one(.Count=249;amount=4)| no time | no time | @point one | no time | @point one | processing msg| @point four | @point four | no time | @point one | ended; | ended; | @point four | processing msg| ended; | ended; | ended; | ... | ended; | 

Here, in case when the last message is already enqueued, we have one consumer task left that has to handle 249 messages alone, however the worst case can be if all them will halt, after the last message, potentialy hundreds of messages will stuck.

It seems that I've finally came up with a solution, not sure about the performance though. Please consider the following code, any feedback will be much appreciated! I still hope to see some other solutions or ideas, even if they will be absolutely different and will require major changes in approach. This is the objective: "a way to process a queue in several threads, dynamically adjusting the number of consumers"

class MessageController
{
    private BlockingCollection<IMessage> messageQueue = new BlockingCollection<IMessage>();

    private ManualResetEvent mre = new ManualResetEvent(true);

    private int amountOfConsumers;

    object o = new object();

    public void Enqueue(IMessage message)
    {
        messageQueue.Add(message);

        mre.WaitOne();
        if (Math.Floor((double)messageQueue.Count / 100)+1 > amountOfConsumers)
        {
            Interlocked.Increment(ref amountOfConsumers);

            var task = Task.Factory.StartNew(() =>
            {
                IMessage msg;
                bool repeat = true;

                while (repeat)
                {
                    while ((messageQueue.Count > 0) && (Math.Floor((double)((messageQueue.Count + 50) / 100)) + 1 >= amountOfConsumers))
                    {
                        msg = messageQueue.Take();
                        //process msg...
                    }

                    lock (o)
                    {
                        mre.Reset();

                        if ((messageQueue.Count == 0) || (Math.Ceiling((double)((messageQueue.Count + 51) / 100)) < amountOfConsumers))
                        {
                            ConsumerQuit();
                            repeat = false;
                        }

                        mre.Set();
                    }
                }
            });
        }
    }

    public void ConsumerQuit()
    {
        Interlocked.Decrement(ref amountOfConsumers);
    }
}

My initial thoughts are that you are designing this backwards.

When looking at parallelism, you may not always gain efficiency by adding more threads to a single task. sometimes the best number is equal to or less than the number of cores on the machine that you are using. The reason for this is that you are creating more overhead with lock contention and thread switching.

By adding more consumers you may find that the consumption rate actually decreases instead of increases.

One thing to consider is how long does it take to process a message? If this time is significantly longer than the time that it takes to produce a task,why not have a single consumer that simply creates a new Task to process each message?

class MessageController
{
    private BlockingCollection<IMessage> messageQueue = new BlockingCollection<IMessage>();

    public void Enqueue(IMessage message)
    {
        messageQueue.Add(message);
    }

    public void Consume()
    {
        //This loop will not exit until messageQueue.CompleteAdding() is called
        foreach (var item in messageQueue.GetConsumingEnumerable())
        {
            IMessage message = item;
            Task.Run(() => ProcessMessage(message);
        }
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM