简体   繁体   中英

.NET 4.0 Concurrent collection performance

I am trying to write a program where i schedule items for removal by putting them in a collection from different threads and cleaning them up in a single thread that iterates of the collection and disposes the items.

Before doing so, I wondered what would yield the optimal performance so I've tried ConcurrentBag, ConcurrentStack and ConcurrentQueue and measured the time required to add 10000000 items.

I used the following program to test this:

class Program
{
    static List<int> list = new List<int>();
    static ConcurrentBag<int> bag = new ConcurrentBag<int>();
    static ConcurrentStack<int> stack = new ConcurrentStack<int>();
    static ConcurrentQueue<int> queue = new ConcurrentQueue<int>();
    static void Main(string[] args)
    {
        run(addList);
        run(addBag);
        run(addStack);
        run(addQueue);
        Console.ReadLine();
    }

    private static void addList(int obj) { lock (list) { list.Add(obj); } }

    private static void addStack(int obj) { stack.Push(obj); }

    private static void addQueue(int obj) { queue.Enqueue(obj); }

    private static void addBag(int obj) { bag.Add(obj); }



    private static void run(Action<int> action)
    {
        Stopwatch stopwatch = Stopwatch.StartNew();
        Parallel.For(0, 10000000, new ParallelOptions() { MaxDegreeOfParallelism = # }, action);
        stopwatch.Stop();
        Console.WriteLine(action.Method.Name + " takes " + stopwatch.Elapsed);
    }
}

where # is the amount of threads used.

but the results are rather confusing:

with 8 threads:

  • addList takes 00:00:00.8166816
  • addBag takes 00:00:01.0368712
  • addStack takes 00:00:01.0902852
  • addQueue takes 00:00:00.6555039

with 1 thread:

  • addList takes 00:00:00.3880958
  • addBag takes 00:00:01.5850249
  • addStack takes 00:00:01.2764924
  • addQueue takes 00:00:00.4409501

so, no matter how many threads, it seems that just locking a plain old list is faster then using any of the concurrent collections, except, perhaps the queue if it needs to handle a lot of writes.

EDIT: after the comments below about Garbage and Debug build: Yes, this influences the benchmark. Debug build influence would be linear, Garbage would be increasing with higher memory usage.

Yet running the same test multiple times gives roughly the same results.

I moved the initialization of the collection to right before the test run and collect the garbage after the run now, like this:

        list = new List<int>();
        run(addList);
        list = null;
        GC.Collect();

with MaxDegreeOfParallelism set to 8 i get the following results:

  • addList takes 00:00:7959546
  • addBag takes 00:00:01.08023823
  • addStack takes 00:00:01.1354566
  • addQueue takes 00:00:00.6597145

with give or take 0.02 seconds deviation each time i run the code.

The concurrent collections are not always faster. more of them only see perf gains at higher levels of contention, and the actual workload has an impact as well. Check out this paper from the pfx team :)

http://blogs.msdn.com/b/pfxteam/archive/2010/04/26/9997562.aspx

Beware of premature optimization though. put something together that works and then optimize. especially since the actual workload is important. also, having locks as a perf bottleneck is pretty ware, usually there is some io or other algorithm that takes far longer :)

Don't forget that you do not also have to add the items to the collection, but also have to retrieve them. So a more fair comparison is between a Monitor -based Queue<T> , and a BlockingCollection<T> , each with 8 producers and 1 consumer.

Then I get the following results on my machine (I've increased the number of iterations by a factor of 10):

  • AddQueue1 takes 00:00:18.0119159
  • AddQueue2 takes 00:00:13.3665991

But it's not only performance that's interesting. Have a look at the two approaches: It's very difficult to check Add/ConsumeQueue1 for correctness, while it's very easy to see that Add/ConsumeQueue2 is exactly doing what's intended thanks to the abstraction provided by BlockingCollection<T> .


static Queue<int> queue1 = new Queue<int>();
static BlockingCollection<int> queue2 = new BlockingCollection<int>();

static void Main(string[] args)
{
    Run(AddQueue1, ConsumeQueue1);
    Run(AddQueue2, ConsumeQueue2);
    Console.ReadLine();
}

private static void AddQueue1(int obj)
{
    lock (queue1)
    {
        queue1.Enqueue(obj);
        if (queue1.Count == 1)
            Monitor.Pulse(queue1);
    }
}

private static void ConsumeQueue1()
{
    lock (queue1)
    {
        while (true)
        {
            while (queue1.Count == 0)
                Monitor.Wait(queue1);
            var item = queue1.Dequeue();
            // do something with item
        }
    }
}

private static void AddQueue2(int obj)
{
    queue2.TryAdd(obj);
}

private static void ConsumeQueue2()
{
    foreach (var item in queue2.GetConsumingEnumerable())
    {
        // do something with item
    }
}

private static void Run(Action<int> action, ThreadStart consumer)
{
    new Thread(consumer) { IsBackground = true }.Start();
    Stopwatch stopwatch = Stopwatch.StartNew();
    Parallel.For(0, 100000000, new ParallelOptions() { MaxDegreeOfParallelism = 8 }, action);
    stopwatch.Stop();
    Console.WriteLine(action.Method.Name + " takes " + stopwatch.Elapsed);
}

I wanted to see a comparison of performance for adding as well as taking. Here is the code I used:

class Program
{
    static List<int> list = new List<int>();
    static ConcurrentBag<int> bag = new ConcurrentBag<int>();
    static ConcurrentStack<int> stack = new ConcurrentStack<int>();
    static ConcurrentQueue<int> queue = new ConcurrentQueue<int>();
    static void Main(string[] args)
    {
        list = new List<int>();
        run(addList);
        run(takeList);

        list = null;
        GC.Collect();

        bag = new ConcurrentBag<int>();
        run(addBag);
        run(takeBag);

        bag = null;
        GC.Collect();

        stack = new ConcurrentStack<int>();
        run(addStack);
        run(takeStack);

        stack = null;
        GC.Collect();

        queue = new ConcurrentQueue<int>();
        run(addQueue);
        run(takeQueue);

        queue = null;
        GC.Collect();

        Console.ReadLine();
    }

    private static void takeList(int obj)
    {
        lock (list)
        {
            if (list.Count == 0)
                return;

            int output = list[obj];
        }
    }

    private static void takeStack(int obj)
    {
        stack.TryPop(out int output);
    }

    private static void takeQueue(int obj)
    {
        queue.TryDequeue(out int output);
    }

    private static void takeBag(int obj)
    {
        bag.TryTake(out int output);
    }

    private static void addList(int obj) { lock (list) { list.Add(obj); } }

    private static void addStack(int obj) { stack.Push(obj); }

    private static void addQueue(int obj) { queue.Enqueue(obj); }

    private static void addBag(int obj) { bag.Add(obj); }



    private static void run(Action<int> action)
    {
        Stopwatch stopwatch = Stopwatch.StartNew();
        Parallel.For(0, 10000000, new ParallelOptions()
        {
            MaxDegreeOfParallelism = 8
        }, action);
        stopwatch.Stop();
        Console.WriteLine(action.Method.Name + " takes " + stopwatch.Elapsed);
    }
}

And the output is:

  • addList takes 00:00:00.8875893
  • takeList takes 00:00:00.7500289
  • addBag takes 00:00:01.8651759
  • takeBag takes 00:00:00.5749322
  • addStack takes 00:00:01.5565545
  • takeStack takes 00:00:00.3838718
  • addQueue takes 00:00:00.8861318
  • takeQueue takes 00:00:01.0510706

是的,但重点是你需要一些并发多个线程,在很长一段时间内使用并发运行它来查看平均性能,因为这不会考虑不同集合的锁定策略。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM