Multithreaded code executes by threadnumber-times slower using System.Threading and Visual Studio C# Express Hosting Process

Question

I have a very simple program counting the characters in a string. An integer threadnum sets the number of threads and divides the data by threadnum accordingly into chunks for each thread to process.

Each thread increments the values contained in a shared dictionary, building a character historgram.

private Dictionary<UInt32, int> dict = new Dictionary<UInt32, int>();

In order to wait for all threads to finish and continue with the main process, I invoke Thread.Join
Initially I had a local dictionary for each thread which get merged afterwards, but a shared dictionary worked fine, without locking.
No references are locked in the method BuildDictionary , though locking the dictionary did not significantly impact thread-execution time.
Each thread is timed, and the resulting dictionary compared.
The dictionary content is the same regardless of a single or multiple threads - as it should be .
Each thread takes a fraction determined by threadnum to complete - as it should be .

Problem :

The total time is roughly a multiple of threadnum , that is to say the execution time increases ?

(Unfortunately I cannot run a C# Profiler at the moment. Additionally I would prefer C# 3 code compatibility. )

Others are likely struggling as well. It may be that the VS 2010 express edition vshost process stacks and schedules threads to be run sequentially?

Another MT-performance issue was posted recently posted here as "Visual Studio C# 2010 Express Debug running Faster than Release" :

Code :

public int threadnum = 8;
Thread[] threads = new Thread[threadnum];
Stopwatch stpwtch = new Stopwatch();
stpwtch.Start();
for (var threadidx = 0; threadidx < threadnum; threadidx++)
{
    threads[threadidx] = new Thread(BuildDictionary);
    threads[threadidx].Start(threadidx);
    threads[threadidx].Join(); //Blocks the calling thread, till thread completion
}
WriteLine("Total - time: {0} msec", stpwtch.ElapsedMilliseconds);

Can you help please?

Update :

It appears that the strange behavior of an almost linear slowdown with increasing thread-number is an artifact due to the numerous hooks of the IDE's Debugger.

Running the process outside the developer environment, I actually do get a 30% speed increase on a 2 logical/physical core machine. During debugging I am already at the high end of CPU utilization, and hence I suspect it is wise to have some leeway during development through additional idle cores.

As initially, I let each thread compute on its own local data-chunk, which is locked and written back to a shared list and aggregated after all threads have finished.

Conclusion :

Be heedful of the environment the process is running in.

Answer 1

We can put the dictionary synchronization issues Tony the Lion mentions in his answer aside for the moment, because in your current implementation you are in fact not running anything in parallel!

Let's take a look at what you are currently doing in your loop:

Start a thread.
Wait for the thread to complete.
Start the next thread.

In other words, you should not be calling Join inside the loop.

Instead, you should start all threads as you are doing, but use a singaling construct such as an AutoResetEvent to determine when all threads have completed.

See example program:

class Program
{
    static EventWaitHandle _waitHandle = new AutoResetEvent(false);

    static void Main(string[] args)
    {
        int numThreads = 5;
        for (int i = 0; i < numThreads; i++)
        {
            new Thread(DoWork).Start(i);
        }
        for (int i = 0; i < numThreads; i++)
        {
            _waitHandle.WaitOne();
        }
        Console.WriteLine("All threads finished");
    }

    static void DoWork(object id)
    {
        Thread.Sleep(1000);
        Console.WriteLine(String.Format("Thread {0} completed", (int)id));
        _waitHandle.Set();
    }
}

Alternatively you could just as well be calling Join in the second loop if you have references to the threads available.

After you have done this you can and should worry about the dictionary synchronization problems.

Answer 2

A Dictionary can support multiple readers concurrently, as long as the collection is not modified . From MSDN

You say:

but a shared dictionary worked fine, without locking.

Each thread increments the values contained in a shared dictionary

Your program is by definition broken, if you alter the data in the dictionary without proper locking, you will end up with bugs. Nothing more needs to be said.

Answer 3

I wouldn't use some shared static Dictionary , if each thread worked on a local copy you could amalgamate your results once all threads had signalled completion.

WaitHandle.WaitAll avoids any deadlocking on an AutoResetEvent .

class Program
{
    static void Main()
    {
        char[] text = "Some String".ToCharArray();
        int numThreads = 5;

        // I leave the implementation of the next line to the OP.
        Partition[] partitions = PartitionWork(text, numThreads);

        completions = new WaitHandle[numThreads];
        results = IDictionary<char, int>[numThreads];

        for (int i = 0; i < numThreads; i++)
        {
            results[i] = new IDictionary<char, int>();
            completions[i] = new ManualResetEvent(false);
            new Thread(DoWork).Start(
                text,
                partitions[i].Start,
                partitions[i].End,
                results[i],
                completions[i]);
        }

        if (WaitHandle.WaitAll(completions, new TimeSpan(366, 0, 0, 0))
        {
            Console.WriteLine("All threads finished");
        }
        else
        {
            Console.WriteLine("Timed out after a year and a day");
        }

        // Merge the results
        IDictionary<char, int> result = results[0];
        for (int i = 1; i < numThreads - 1; i ++)
        {
            foreach(KeyValuePair<char, int> item in results[i])
            {
                if (result.ContainsKey(item.Key)
                {
                    result[item.Key] += item.Value;
                }
                else
                {
                   result.Add(item.Key, item.Value);
                }
            }
        }
    }

    static void BuildDictionary(
        char[] text, 
        int start, 
        int finish,
        IDictionary<char, int> result,
        WaitHandle completed)
    {
        for (int i = start; i <= finish; i++)
        {
            if (result.ContainsKey(text[i])
            {
                result[text[i]]++;
            }
            else
            {
               result.Add(text[i], 1);
            }
        }
        completed.Set();
    }
}

With this implementation the only variable that is ever shared is the char[] of the text and that is always read only.

You do have the burden of merging the dictionaries at the end but, that is a small price for avoiding any concurrencey issues. In a later version of the framework I would have used TPL and ConcurrentDictionary and possibly Partitioner<TSource> .

Answer 4

Roem saw it.

Your main thread should Join the X other Threads after having started all of them.

Else it waits for the 1st thread to be finished, to start and wait for the 2nd one.

for (var threadidx = 0; threadidx < threadnum; threadidx++)
{
    threads[threadidx] = new Thread(BuildDictionary);
    threads[threadidx].Start(threadidx);
}

for (var threadidx = 0; threadidx < threadnum; threadidx++)
{
    threads[threadidx].Join(); //Blocks the calling thread, till thread completion
}

Answer 5

I totally agree with TonyTheLion and others, and as you fix the actual problem with join'ing at the wrong place, there still will be problem with (no) locks and updating the shared dictionary. I wanted to drop you a quick workaround: just wrap your integer value into some object:

instead of:

Dictionary<uint, int> dict = new Dictionary<uint, int>();

use:

class Entry { public int value; }
Dictionary<uint, Entry> dict = new Dictionary<uint, Entry>();

and now increment the Entry::value instead. That way, the Dictionary will not notice any changes and it will be safe without locking the dictionary .

Note: this will however work only if you are guaranteed if one thread would use only its own one Entry. I've just noticed this is not true as you said 'histogram of characters'. You will have to lock over each Entry during the increment, or some increments may be lost. Still, locking at Entry layer will speed up signinificantly when compared to locking at whole dictionary

Answer 6

As Rotem points out, by joining in the loop you are waiting for each thread to complete before going continuing.

The hint for why this is can be found on the Thread.Join documentation on MSDN

Blocks the calling thread until a thread terminates

So you loop will not continue until that one thread has completed it's work. To start all the threads then wait for them to complete, join them outside the loop:

public int threadnum = 8;
Thread[] threads = new Thread[threadnum];
Stopwatch stpwtch = new Stopwatch();
stpwtch.Start();

// Start all the threads doing their work
for (var threadidx = 0; threadidx < threadnum; threadidx++) 
{
     threads[threadidx] = new Thread(BuildDictionary);
     threads[threadidx].Start(threadidx);
}
// Join to all the threads to wait for them to complete
for (var threadidx = 0; threadidx < threadnum; threadidx++) 
{
    threads[threadidx].Join();
}

System.Diagnostics.Debug.WriteLine("Total - time: {0} msec", stpwtch.ElapsedMilliseconds);

You will really need to post your BuildDictionary function. It is very likely that the operation will be no faster with multiple threads and the threading overhead will actually increase execution time.

Multithreaded code executes by threadnumber-times slower using System.Threading and Visual Studio C# Express Hosting Process

Question

6 answers

solution1
3 2012-09-06 08:14:09

solution2
2 2012-09-06 08:09:34

solution3
1 2012-09-06 09:29:14

solution4
0 2012-09-06 08:12:21

solution5
0 2012-09-06 08:17:07

solution6
0 2012-09-06 08:30:23

Multithreaded code executes by threadnumber-times slower using System.Threading and Visual Studio C# Express Hosting Process

Question

6 answers

solution1 3 2012-09-06 08:14:09

solution2 2 2012-09-06 08:09:34

solution3 1 2012-09-06 09:29:14

solution4 0 2012-09-06 08:12:21

solution5 0 2012-09-06 08:17:07

solution6 0 2012-09-06 08:30:23

solution1
3 2012-09-06 08:14:09

solution2
2 2012-09-06 08:09:34

solution3
1 2012-09-06 09:29:14

solution4
0 2012-09-06 08:12:21

solution5
0 2012-09-06 08:17:07

solution6
0 2012-09-06 08:30:23