Lock-free Reference Counting

Question

I'm working on a system that requires extensive C API interop. Part of the interop requires initialization and shutdown of the system in question before and after any operations. Failure to do either will result in instability in the system. I've accomplished this by simply implementing reference counting in a core disposable environment class like this:

public FooEnvironment()
{
  lock(EnvironmentLock)
  {
    if(_initCount == 0)
    {
      Init();  // global startup
    }
    _initCount++;
  }
}

private void Dispose(bool disposing)
{
  if(_disposed)
    return;

  if(disposing)
  {
    lock(EnvironmentLock)
    {
      _initCount--;
      if(_initCount == 0)
      {
        Term(); // global termination
      }
    }
  }
}

This works fine and accomplished the goal. However, since any interop operation must be nested in a FooEnvironment using block, we are locking all the time and profiling suggests that this locking accounts for close to 50% of the work done during run-time. It seems to me that this is a fundamental enough concept that something in .NET or the CLR must address it. Is there a better way to do reference counting?

Answer 1

This is a trickier task than you might expect at first blush. I don't believe that Interlocked.Increment will be sufficient to your task. Rather, I expect you to need to perform some wizardry with CAS (Compare-And-Swap).

Note also that it's very easy to get this mostly-right, but mostly-right is still completely wrong when your program crashes with heisenbugs.

I strongly suggest some genuine research before going down this path. A couple good jumping off points pop to the top if you do a search for "Lock free reference counting." This Dr. Dobbs article is useful, and this SO Question might be relevant.

Above all, remember that lock free programming is hard . If this is not your specialty, consider stepping back and adjusting your expectations around the granularity of your reference counts. It may be much, much less expensive to rethink your fundamental refcount policy than to create a reliable lock-free mechanism if you're not an expert. Especially when you don't yet know that a lock-free technique will actually be any faster.

Answer 2

As harold's comment notes the answer is Interlocked :

public FooEnvironment() {
  if (Interlocked.Increment(ref _initCount) == 1) {
    Init();  // global startup
  }
}

private void Dispose(bool disposing) {
  if(_disposed)
    return;

  if (disposing) {
    if (0 == Interlocked.Decrement(ref _initCount)) {
      Term(); // global termination
    }
  }
}

Both Increment and Decrement return the new count (just for this kind of usage), hence different checks.

But note: this will not work if anything else needs concurrency protection. Interlocked operations are themselves safe, but nothing else is (including different threads relative ordering of Interlocked calls). In the above code Init() can still be running after another thread has completed the constructor.

Answer 3

Probably use a general static variable in a class. Static is only one thing and is not specific to any object.

Answer 4

I believe this will give you a safe way using Interlocked.Increment/Decrement.

Note : This is oversimplified, the code below can lead to deadlock if Init() throws an exception. There is also a race condition in the Dispose when the count goes to zero, the init is reset and the constructor is called again. I don't know your program flow, so you may be better off using a cheaper lock like a SpinLock as opposed to the InterlockedIncrement if you have potential of initing again after several dispose calls.

static ManualResetEvent _inited = new ManualResetEvent(false);
public FooEnvironment()
{
    if(Interlocked.Increment(ref _initCount) == 1)
    {
        Init();  // global startup
        _inited.Set();
    }

    _inited.WaitOne();
}

private void Dispose(bool disposing)
{
    if(_disposed)
        return;

    if(disposing)
    {
        if(Interlocked.Decrement(ref _initCount) == 0)
        {
            _inited.Reset();
            Term(); // global termination
        }
    }
}

Edit:
In thinking about this further, you may want to consider some application redesign and instead of this class to manage Init and Term, just have a single call to Init at application startup and a call to Term when the app comes down, then you remove the need for locking altogether, and if the lock is showing up as 50% of your execution time, then it seems like you are always going to want to call Init, so just call it and away you go.

Answer 5

You can make it nearly lock-free by using the following code. It would definitely lower contention and if this is your main problem it would be the solution you need.

Also I would suggest to call Dispose from destructor/finalizer (just in case). I have changed your Dispose method - unmanaged resources should be freed regardless of disposing argument. Check this for details on how to properly dispose an object.

Hope this helps you.

public class FooEnvironment
{
    private static int _initCount;
    private static bool _initialized;
    private static object _environmentLock = new object();

    private bool _disposed;

    public FooEnvironment()
    {
        Interlocked.Increment(ref _initCount);

        if (_initCount > 0 && !_initialized)
        {
            lock (_environmentLock)
            {
                if (_initCount > 0 && !_initialized)
                {
                    Init(); // global startup
                    _initialized = true;
                }
            }
        }
    }

    private void Dispose(bool disposing)
    {
        if (_disposed)
            return;

        if (disposing)
        {
            // Dispose managed resources here
        }

        Interlocked.Decrement(ref _initCount);

        if (_initCount <= 0 && _initialized)
        {
            lock (_environmentLock)
            {
                if (_initCount <= 0 && _initialized)
                {
                    Term(); // global termination
                    _initialized = false;
                }
            }
        }

        _disposed = true;
    }

    ~FooEnvironment()
    {
        Dispose(false);
    }
}

Answer 6

Using Threading.Interlocked.Increment will be a little faster than acquiring a lock, doing an increment, and releasing the lock, but not enormously so. The expensive part of either operation on a multi-core system is enforcing the synchronization of memory caches between cores. The primary advantage of Interlocked.Increment is not speed, but rather the fact that it will complete in a bounded amount of time. By contrast, if one seeks to acquire a lock, perform an increment, and release the lock, even if the lock is used for no purpose other than guarding the counter, there is a risk that one might have to wait forever if some other thread acquires the lock and then gets waylaid.

You don't mention which version of .net you're using, but there are some Concurrent classes that might be of use. Depending upon your patterns of allocating and freeing things, a class that might seem a little tricky but could work well is the ConcurrentBag class. It's somewhat like a queue or stack, except that there's no guarantee that things will come out any particular order. Include in your resource wrapper a flag indicating whether it's still good, and include with the resource itself a reference to a wrapper. When an resource user is created, throw a wrapper object in the bag. When the resource user is no longer needed, set the "invalid" flag. The resource should remain alive as long as either there's at least one wrapper object in the bag whose "valid" flag is set, or the resource itself holds a reference to a valid wrapper. If when an item is deleted the resource doesn't seem to hold a valid wrapper, acquire a lock and, if the resource still doesn't hold a valid wrapper, pull wrappers out of the bag until a valid one is found, and then store that one with the resource (or, if none was found, destroy the resource). If when an item is deleted the resource holds a valid wrapper but the bag seems like it might hold an excessive number of invalid items, acquire the lock, copy the bag's contents to an array, and throw valid items back into the bag. Keep a count of how many items are thrown back, so one can judge when to do the next purge.

This approach may seem more complicated than using locks or Threading.Interlocked.Increment , and there are a lot of corner cases to worry about, but it may offer better performance because ConcurrentBag is designed to reduce resource contention. If processor 1 performs Interlocked.Increment on some location, and then processor 2 does so, processor 2 will have to instruct processor 1 to flush that location from its cache, wait until processor 1 has done so, inform all the other processors that it needs control of that location, load that location into its cache, and finally get around to incrementing it. After all that has happened, if processor 1 needs to increment the location again, the same general sequence of steps will be required. All of this is very slow. The ConcurrentBag class, by contrast, is designed so that multiple processors can add things to a list without cache collisions. Sometime between when things are added and when they're removed, they'll have to be copied to a coherent data structure, but such operations can be performed in batches in such a way as to yield good cache performance.

I haven't tried an approach like the above using ConcurrentBag , so I don't know what sort of performance it would actually yield, but depending upon the usage patterns it may be possible to give better performance than would be obtained via reference counting.

Answer 7

Interlocked class approach work a little faster than the lock statment, but on a multi-core machine the speed advantage may not be very much, because Interlocked instructions must bypass the memory cache layers.

How important is it to call the Term() function when the code is not in use and/or when the program exits?

Frequently, you can just put the call to Init() once in a static constructor for the class that wraps the other APIs, and not really worry about calling Term(). Eg:

static FooEnvironment() { 
    Init();  // global startup 
}

The CLR will ensure that the static constructor will get called once, before any other member functions in the enclosing class.

It's also possible to hook notification of some (but not all) application shutdown scenarios, making it possible to call Term() on clean shutdowns. See this article. http://www.codeproject.com/Articles/16164/Managed-Application-Shutdown

Lock-free Reference Counting

Question

7 answers

solution1
9 ACCPTED 2012-04-09 15:16:26

solution2
1 2012-04-09 13:55:30

solution3
0 2012-04-09 13:49:59

solution4
0 2012-04-09 13:55:54

solution5
0 2012-04-10 09:26:05

solution6
0 2012-04-12 01:05:34

solution7
0 2012-04-23 17:08:07

Lock-free Reference Counting

Question

7 answers

solution1 9 ACCPTED 2012-04-09 15:16:26

solution2 1 2012-04-09 13:55:30

solution3 0 2012-04-09 13:49:59

solution4 0 2012-04-09 13:55:54

solution5 0 2012-04-10 09:26:05

solution6 0 2012-04-12 01:05:34

solution7 0 2012-04-23 17:08:07

solution1
9 ACCPTED 2012-04-09 15:16:26

solution2
1 2012-04-09 13:55:30

solution3
0 2012-04-09 13:49:59

solution4
0 2012-04-09 13:55:54

solution5
0 2012-04-10 09:26:05

solution6
0 2012-04-12 01:05:34

solution7
0 2012-04-23 17:08:07