简体   繁体   中英

How to implement a thread-safe cache mechanism when working with collections?

Scenario :

  • I have a bunch of Child objects, all related to a given Parent .
  • I'm working on an ASP.NET MVC 3 Web Application (eg multi-threaded)
  • One of the pages is a "search" page where i need to grab a given sub-set of children and "do stuff" in memory to them (calculations, ordering, enumeration)
  • Instead of getting each child in a seperate call, i do one call to the database to get all children for a given parent, cache the result and "do stuff" on the result.
  • The problem is that the "do stuff" involves LINQ operations (enumeration, adding/removing items from the collection), which when implemented using List<T> is not thread-safe.

I have read about ConcurrentBag<T> and ConcurrentDictionary<T> but not sure if i should use one of these, or implement the synchronization/locking myself.

I'm on .NET 4, so i'm utilizing ObjectCache as a singleton instance of MemoryCache.Default . I have a service layer which works with the cache, and the services accept an instance of ObjectCache , which is done via constructor DI. This way all the services share the same ObjectCache instance.

The main thread safety issue is I need to loop through the current "cached" collection, and if the child I'm working with is already there, I need to remove it and add the one I'm working with back in, this enumeration is what causes the issues.

Any recommendations?

Yes, to implement a cache efficiently, it needs a fast lookup mechanism and so List<T> is the wrong data structure out-of-the-box. A Dictionary<TKey, TValue> is an ideal data structure for a cache because it provides a way to replace:

var value = instance.GetValueExpensive(key);

with:

var value = instance.GetValueCached(key);

by using cached values in a dictionary and using the dictionary to do the heavy lifting for lookup. The caller is none the wiser.

But, if the callers could be calling from multiple threads then .NET4 provides ConcurrentDictionary<TKey, TValue> that works perfectly in this situation. But what does the dictionary cache? It seems like in your situation the dictionary key is the child and the dictionary values are the database results for that child.

OK, so now we have a thread-safe and efficient cache of database results keyed by child. What data structure should we use for the database results?

You haven't said what those results look like but since you use LINQ we know that they are at least IEnumerable<T> and maybe even List<T> . So we're back to the same problem, right? Because List<T> is not thread-safe, we can't use it for the dictionary value. Or can we?

A cache must be read-only from the point-of-view of the caller. You say with LINQ you "do stuff" like add/remove, but that makes no sense for a cached value . It only makes sense to do stuff in the implementation of the cache itself such as replacing a stale entry with new results.

The dictionary value, because it is read-only, can be a List<T> with no ill effects , even though it will be accessed from multiple threads. You can use List<T>.AsReadOnly to improve your confidence and add some compile-time safety checking.

But the important point is that List<T> is only not thread-safe if it is mutable. Since by definition a method implemented using a cache must return the same value if called multiple times (until the value is invalidated by the cache itself), the client cannot modify the returned value and so the List<T> must be frozen, effectively immutable.

If the client desperately needs to modify the cached database results and the value is a List<T> , then the only safe way to do so is to:

  1. Make a copy
  2. Make changes to the copy
  3. Ask the cache to update the value

In summary, use a thread-safe dictionary for the top-level cache and an ordinary list for the cached value, being careful never to modify the contents of the last after inserting it into the cache.

Try modifying RefreshFooCache like so:

    public ReadOnlyCollection<Foo> RefreshFooCache(Foo parentFoo)
       {
         ReadOnlyCollection<Foo> results;


        try {
// Prevent other writers but allow read operations to proceed
        Lock.EnterUpgradableReaderLock();

    // Recheck your cache: it may have already been updated by a previous thread before we gained exclusive lock
    if (_cache.Get(parentFoo.Id.ToString()) != null) {
      return parentFoo.FindFoosUnderneath(uniqueUri).AsReadOnly();
    }

        // Get the parent and everything below.
        var parentFooIncludingBelow = _repo.FindFoosIncludingBelow(parentFoo.UniqueUri).ToList();

// Now prevent both other writers and other readers
        Lock.EnterWriteLock();

        // Remove the cache
        _cache.Remove(parentFoo.Id.ToString());

        // Add the cache.
        _cache.Add(parentFoo.Id.ToString(), parentFooIncludingBelow );


       } finally {
    if (Lock.IsWriteLockHeld) {
            Lock.ExitWriteLock();
    }
        Lock.ExitUpgradableReaderLock();
       }
        results = parentFooIncludingBelow.AsReadOnly();
        return results;
       }

EDIT - updated to use ReadOnlyCollection instead of ConcurrentDictionary

Here's what i currently have implemented. I ran some load tests and didn't see any errors happen - what do you guys think of this implementation:

public class FooService
{
   private static ReaderWriterLockSlim _lock;
   private static readonly object SyncLock = new object();

   private static ReaderWriterLockSlim Lock
   {
      get
      {
         if (_lock == null)
         {
           lock(SyncLock)
           {
              if (_lock == null)
                 _lock = new ReaderWriterLockSlim();
           }
         } 

         return _lock;
      }
   }

   public ReadOnlyCollection<Foo> RefreshFooCache(Foo parentFoo)
   {
       // Get the parent and everything below.
       var parentFooIncludingBelow = repo.FindFoosIncludingBelow(parentFoo.UniqueUri).ToList().AsReadOnly();

       try
       {
          Lock.EnterWriteLock();

          // Remove the cache
         _cache.Remove(parentFoo.Id.ToString());

         // Add the cache.
         _cache.Add(parentFoo.Id.ToString(), parentFooIncludingBelow);
       }
       finally
       {
          Lock.ExitWriteLock();
       }

       return parentFooIncludingBelow;
   }

   public ReadOnlyCollection<Foo> FindFoo(string uniqueUri)
   {
      var parentIdForFoo = uniqueUri.GetParentId();
      ReadOnlyCollection<Foo> results;

      try
      {
         Lock.EnterReadLock();
         var cachedFoo = _cache.Get(parentIdForFoo);

         if (cachedFoo != null)
            results = cachedFoo.FindFoosUnderneath(uniqueUri).ToList().AsReadOnly();
      }
      finally
      {
         Lock.ExitReadLock();
      }

      if (results == null)
         results = RefreshFooCache(parentFoo).FindFoosUnderneath(uniqueUri).ToList().AsReadOnly();
      }

      return results;
   }
}

Summary:

  • Each Controller (eg each HTTP request) gets given a new instance of FooService .
  • FooService has a static lock, so all the HTTP requests share the same instance.
  • I've used ReaderWriterLockSlim to ensure multiple threads can read , but only one thread can write . I'm hoping this should avoid "read" deadlocks. If two threads happen to need to "write" at the same time, then of course one will have to wait. But the goal here is to serve the reads quickly.
  • The goal is to be able to find a "Foo" from the cache as long as something above it has been retrieved.

Any tips/suggestions/problems with this approach? I'm not sure if im locking the write areas. But because i need to run LINQ on the dictionary, i figured i need a read lock.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM