简体   繁体   中英

Optimize cache with multiple keys in c# - remove duplication of objects

I have a project in Asp.Net Core. This project has a ICacheService as below:

public interface ICacheService
{
    T Get<T>(string key);
    T Get<T>(string key, Func<T> getdata);
    Task<T> Get<T>(string key, Func<Task<T>> getdata); 
    void AddOrUpdate(string key, object value);
} 

The implementation is simply based on ConcurrentDictionary<string, object> , so its not that complicated, just storing and retrieving data from this dictionary. At one of my services I have a method as below:

public async Task<List<LanguageInfoModel>> GetLanguagesAsync(string frontendId, string languageId, string accessId) 
{
    async Task<List<LanguageInfoModel>> GetLanguageInfoModel()
    {
        var data = await _commonServiceProxy.GetLanguages(frontendId, languageId, accessId);
        return data;
    }

    _scheduler.ScheduleAsync($"{CacheKeys.Jobs.LanguagesJob}_{frontendId}_{languageId}_{accessId}", async () =>
    {
        _cacheService.AddOrUpdate($"{CacheKeys.Languages}_{frontendId}_{languageId}_{accessId}", await GetLanguageInfoModel());
        return JobStatus.Success;
    }, TimeSpan.FromMinutes(5.0));

    return await _cacheService.Get($"{CacheKeys.Languages}_{frontendId}_{languageId}_{accessId}", async () => await GetLanguageInfoModel());
}

The problem is that I have three params in this method that I use as a cache key. This works fine but the problem is that the combination of three params is pretty high so there will be so many duplication of objects in cache. I was thinking to create a cache without duplication like below:

To have a cache with a list as a key where I can store more than one key for one object. So when I get new elements I will check for each of them if it is in the cache, if it is in the cache I will only add a key in the key list otherwise insert a new element in the cache. The problem here is that testing if an object is in the cache is a big problem. I think it will consume a lot of resources and would need some serialization into a specific form to make the comparison possible which will make again the comparison consuming a lot of resources. The cache might look something like this CustomDictionary<List<string>, object>

Does anybody know a good approach of solving this issue to not duplicate objects in the cache ?

EDIT 1:

My main concern is when I retrieve List<MyModel> from my webservices because they might have 80% of the objects with the same data which will drastically increase the size in memory. But this would be relevant for simple cases as well. Lest suppose I have something like this:

MyClass o1 = new MyObject();
_cache.Set("key1", o1);
_cashe.Set("key2", o1);

In this case when trying to add the same object twice I would like to not duplicate it but to have key2 somehow pointing to the same object as key1 . If this achieved it will be problem to invalidate them but I expect to have something like this:

_cache.Invalidate("key2");

This will check if there is another key pointing to same object. If so, it will only remove the key otherwise destroy the object itself.

Maybe we could reformulate this problem to two separate issues ...

  1. executing the call for each combination and
  2. storing n times the identical result, wasting tons of memory

For 1 I don't have any idea how we could prevent it, as we do not know prior to execution if we will fetch a duplicate in this setup. We would need more information that is based on when these values vary, which may or may not be possible.

For 2 one solution would be to override hashcode so it is based on the actual returned values. A good solution would be generic and walk through the object tree (which probably can be expensive). Would like to know if there are any pre-made solutions for this actually.

This answer is specifically for returning List<TItem> s, rather than just individual TItem s, and it avoids duplication of any TItem as well as any List<T> . It uses arrays, because you're trying to save memory, and arrays will use less than a List .

Note that for this (and any solution really) to work, you MUST override Equals and GetHashCode on TItem , so that it knows what a duplicate item is. (Unless the data provider is returning the same object each time, which is unlikely.) If you don't have control of TItem , but you can yourself determine whether two TItem s are equal, you can use an IEqualityComparer to do this, but the below solution would need to be modified very slightly in order to do that.

View the solution with a basic test at: https://dotnetfiddle.net/pKHLQP

public class DuplicateFreeCache<TKey, TItem> where TItem : class
{
    private ConcurrentDictionary<TKey, int> Primary { get; } = new ConcurrentDictionary<TKey, int>();
    private List<TItem> ItemList { get; } = new List<TItem>();
    private List<TItem[]> ListList { get; } = new List<TItem[]>();
    private Dictionary<TItem, int> ItemDict { get; } = new Dictionary<TItem, int>();
    private Dictionary<IntArray, int> ListDict { get; } = new Dictionary<IntArray, int>();

    public IReadOnlyList<TItem> GetOrAdd(TKey key, Func<TKey, IEnumerable<TItem>> getFunc)
    {
        int index = Primary.GetOrAdd(key, k =>
        {
            var rawList = getFunc(k);

            lock (Primary)
            {
                int[] itemListByIndex = rawList.Select(item =>
                {
                    if (!ItemDict.TryGetValue(item, out int itemIndex))
                    {
                        itemIndex = ItemList.Count;
                        ItemList.Add(item);
                        ItemDict[item] = itemIndex;
                    }
                    return itemIndex;
                }).ToArray();

                var intArray = new IntArray(itemListByIndex);

                if (!ListDict.TryGetValue(intArray, out int listIndex))
                {
                    lock (ListList)
                    {
                        listIndex = ListList.Count;
                        ListList.Add(itemListByIndex.Select(ii => ItemList[ii]).ToArray());
                    }
                    ListDict[intArray] = listIndex;
                }

                return listIndex;
            }
        });

        lock (ListList)
        {
            return ListList[index];
        }
    }


    public override string ToString()
    {
        StringBuilder sb = new StringBuilder();
        sb.AppendLine($"A cache with:");
        sb.AppendLine($"{ItemList.Count} unique Items;");
        sb.AppendLine($"{ListList.Count} unique lists of Items;");
        sb.AppendLine($"{Primary.Count} primary dictionary items;");
        sb.AppendLine($"{ItemDict.Count} item dictionary items;");
        sb.AppendLine($"{ListDict.Count} list dictionary items;");
        return sb.ToString();
    }

    //We have this to make Dictionary lookups on int[] find identical arrays.
    //One could also just make an IEqualityComparer, but I felt like doing it this way.
    public class IntArray
    {
        private readonly int _hashCode;
        public int[] Array { get; }
        public IntArray(int[] arr)
        {
            Array = arr;
            unchecked
            {
                _hashCode = 0;
                for (int i = 0; i < arr.Length; i++)
                    _hashCode = (_hashCode * 397) ^ arr[i];
            }
        }

        protected bool Equals(IntArray other)
        {
            return Array.SequenceEqual(other.Array);
        }

        public override bool Equals(object obj)
        {
            if (ReferenceEquals(null, obj)) return false;
            if (ReferenceEquals(this, obj)) return true;
            if (obj.GetType() != this.GetType()) return false;
            return Equals((IntArray)obj);
        }

        public override int GetHashCode() => _hashCode;
    }
}

It occurred to me that a ReaderWriterLockSlim would be better than the lock(ListList) , if the lock is causing performance to lag, but it's very slightly more complicated.

Similar to @MineR, this solution is performing a 'double caching' operation: it caches the key'ed lists (lookups) as well as the individual objects - performing an automatic deduplication.

It is a fairly simple solution using two ConcurrentDictionaries - one acting as a HashSet and one as a keyed lookup. This allows most of the threading concerns to be handled by the framework.

You can also pass in and share the hashset between multiple Cachedlookups allowing lookups with different keys.

Note that object equality or an IEqualityComparer are required to make any such solution function.

Class:

public class CachedLookup<T, TKey>
{        
    private readonly ConcurrentDictionary<T, T> _hashSet;
    private readonly ConcurrentDictionary<TKey, List<T>> _lookup = new ConcurrentDictionary<TKey, List<T>>();

    public CachedLookup(ConcurrentDictionary<T, T> hashSet)
    {
        _hashSet = hashSet;
    }   

    public CachedLookup(IEqualityComparer<T> equalityComparer = default)
    {
        _hashSet = equalityComparer is null ? new ConcurrentDictionary<T, T>() : new ConcurrentDictionary<T, T>(equalityComparer);
    }

    public List<T> Get(TKey key) => _lookup.ContainsKey(key) ? _lookup[key] : null;

    public List<T> Get(TKey key, Func<TKey, List<T>> getData)
    {
        if (_lookup.ContainsKey(key))
            return _lookup[key];

        var result = DedupeAndCache(getData(key));

        _lookup.TryAdd(key, result);

        return result;
    }
    public async ValueTask<List<T>> GetAsync(TKey key, Func<TKey, Task<List<T>>> getData)
    {
        if (_lookup.ContainsKey(key))
            return _lookup[key];

        var result = DedupeAndCache(await getData(key));

        _lookup.TryAdd(key, result);

        return result;
    }

    public void Add(T value) => _hashSet.TryAdd(value, value);

    public List<T> AddOrUpdate(TKey key, List<T> data)
    {            
        var deduped = DedupeAndCache(data);

        _lookup.AddOrUpdate(key, deduped, (k,l)=>deduped);

        return deduped;
    }

    private List<T> DedupeAndCache(IEnumerable<T> input) => input.Select(v => _hashSet.GetOrAdd(v,v)).ToList();
}

Example Usage:

public class ExampleUsage
{
    private readonly CachedLookup<LanguageInfoModel, (string frontendId, string languageId, string accessId)> _lookup 
        = new CachedLookup<LanguageInfoModel, (string frontendId, string languageId, string accessId)>(new LanguageInfoModelComparer());

    public ValueTask<List<LanguageInfoModel>> GetLanguagesAsync(string frontendId, string languageId, string accessId)
    {
        return _lookup.GetAsync((frontendId, languageId, accessId), GetLanguagesFromDB(k));
    }

    private async Task<List<LanguageInfoModel>> GetLanguagesFromDB((string frontendId, string languageId, string accessId) key) => throw new NotImplementedException();
}

public class LanguageInfoModel
{
    public string FrontendId { get; set; }
    public string LanguageId { get; set; }
    public string AccessId { get; set; }
    public string SomeOtherUniqueValue { get; set; }
}

public class LanguageInfoModelComparer : IEqualityComparer<LanguageInfoModel>
{
    public bool Equals(LanguageInfoModel x, LanguageInfoModel y)
    {
        return (x?.FrontendId, x?.AccessId, x?.LanguageId, x?.SomeOtherUniqueValue)
            .Equals((y?.FrontendId, y?.AccessId, y?.LanguageId, y?.SomeOtherUniqueValue));
    }

    public int GetHashCode(LanguageInfoModel obj) => 
        (obj.FrontendId, obj.LanguageId, obj.AccessId, obj.SomeOtherUniqueValue).GetHashCode();
}

Notes:

The CachedLookup class is generic on both the value and key. The example use of ValueTuple makes it easy to have compound keys. I have also used ValueTuple s to simplify the equality comparisons.

This usage of ValueTask fits nicely with its intended purpose, returning the cached list synchronously.

If you have access to the lower level data access layer, one optimization would be to move the deduplication to happen before the objects are instantiated (based on property value equality). This would reduce the allocations and load on the GC.

If you have control over your complete solution then you can do something like this.

  1. Whatever object that is capable of storing in Cache. You have to identify that. All Such object implement common interface.

     public interface ICacheable { string ObjectId(); // This will implement logic to calculate each object identity. You can count hash code but you have to add some other value to. } 
  2. Now when you store object in Cache. You do two thing.

    • Store Two way things. Like one cache store ObjectId to Key.
    • Another will contains ObjectId to Object.

    • Overall idea is that when you get object. You search in first cache and see that the key you want is there against ObjectId. If yes then no further action otherwise you have to create new entry in First Cache for ObjectId to Key Map.

    • If object is not present then you have to create entry in both cache

Note : You have to overcome performance issue. Because your keys is some kind of list so it create problem while searching.

It sound to me as though you need to implement some sort of index. Assuming that your model is fairly large, which is why you want to save memory then you could do this with two concurrent dictionaries.

The first would be ConcurrentDictionary<string, int> (or whatever unique id applies to your model object) and would contain your key values. Each key is obviously be different as per all your combinations, but you are only duplicating the int unique key for all of your objects, not the entire object.

The second dictionary would be a ConcurrentDictionary<int, object> or ConcurrentDictionary<int, T> and would contain your unique large objects indexed via their unique key.

When building the cache you would need to populate both dictionaries, the exact method would depend upon how you are doing it at the moment.

To retrieve an object you would build the key as you do at the moment, retrieve the hashcode value from the first dictionary, and then use that to locate the actual object from the second dictionary.

It is also possible to invalidate one key without invalidating the main object another key is also using it, although it does require you to iterate over the index dictionary to check if any other key is pointing to the same object.

I think this is not a caching concern where one key map to one and only one data. Yours is not in this case. You are trying to manipulate a local data repository in memory work as cached data. You are trying to create mappers between keys and objects that loaded from remote. One key is able to map to many objects. One object can be mapped by many Keys, so the relationship is n <======> n

I have created a sample modal as following

在此输入图像描述

Key, KeyMyModel and MyModel are classes for caching handler RemoteModel is class that you got from remote service

With this models, you are able to meet the requirements. This utilizes entity Id to specify an object, does not need to hash to specify duplications. This is very basic that I have implemented set method. Invaildate a key is very similar. You must write code that ensure thread safe as well

public class MyModel
    {
        public RemoteModel RemoteModel { get; set; }
        public List<KeyMyModel> KeyMyModels { get; set; }
    }
    public class RemoteModel
    {
        public string Id { get; set; } // Identity property this get from remote service
        public string DummyProperty { get; set; } // Some properties returned by remote service
    }
    public class KeyMyModel
    {
        public string Key { get; set; }
        public string MyModelId { get; set; }
    }
    public class Key
    {
        public string KeyStr { get; set; }
        public List<KeyMyModel> KeyMyModels { get; set; }
    }

    public interface ICacheService
    {
        List<RemoteModel> Get(string key);
        List<RemoteModel> Get(string key, Func<List<RemoteModel>> getdata);
        Task<List<RemoteModel>> Get(string key, Func<Task<List<RemoteModel>>> getdata);
        void AddOrUpdate(string key, object value);
    }

    public class CacheService : ICacheService
    {
        public List<MyModel> MyModels { get; private set; }
        public List<Key> Keys { get; private set; }
        public List<KeyMyModel> KeyMyModels { get; private set; }

        public CacheService()
        {
            MyModels = new List<MyModel>();
            Keys = new List<Key>();
            KeyMyModels = new List<KeyMyModel>();
        }
        public List<RemoteModel> Get(string key)
        {
            return MyModels.Where(s => s.KeyMyModels.Any(t => t.Key == key)).Select(s => s.RemoteModel).ToList();
        }

        public List<RemoteModel> Get(string key, Func<List<RemoteModel>> getdata)
        {
            var remoteData = getdata();
            Set(key, remoteData);

            return MyModels.Where(s => s.KeyMyModels.Any(t => t.Key == key)).Select(t => t.RemoteModel).ToList();
        }

        public Task<List<RemoteModel>> Get(string key, Func<Task<List<RemoteModel>>> getdata)
        {
            throw new NotImplementedException();
        }

        public void AddOrUpdate(string key, object value)
        {
            throw new NotImplementedException();
        }

        public void Invalidate(string key)
        {

        }

        public void Set(string key, List<RemoteModel> data)
        {
            var Key = Keys.FirstOrDefault(s => s.KeyStr == key) ?? new Key()
            {
                KeyStr = key
            };

            foreach (var remoteModel in data)
            {
                var exist = MyModels.FirstOrDefault(s => s.RemoteModel.Id == remoteModel.Id);
                if (exist == null)
                {
                    // add data to the cache
                    var myModel = new MyModel()
                    {
                        RemoteModel = remoteModel
                    };
                    var keyMyModel = new KeyMyModel()
                    {
                        Key = key,
                        MyModelId = remoteModel.Id
                    };
                    myModel.KeyMyModels.Add(keyMyModel);
                    Key.KeyMyModels.Add(keyMyModel);
                    Keys.Add(Key);
                }
                else
                {
                    exist.RemoteModel = remoteModel;
                    var existKeyMyModel =
                        KeyMyModels.FirstOrDefault(s => s.Key == key && s.MyModelId == exist.RemoteModel.Id);
                    if (existKeyMyModel == null)
                    {
                        existKeyMyModel = new KeyMyModel()
                        {
                            Key = key,
                            MyModelId = exist.RemoteModel.Id
                        };
                        Key.KeyMyModels.Add(existKeyMyModel);
                        exist.KeyMyModels.Add(existKeyMyModel);
                        KeyMyModels.Add(existKeyMyModel);
                    }
                }
            }

            // Remove MyModels if need
            var remoteIds = data.Select(s => s.Id);
            var currentIds = KeyMyModels.Where(s => s.Key == key).Select(s => s.MyModelId);
            var removingIds = currentIds.Except(remoteIds);
            var removingKeyMyModels = KeyMyModels.Where(s => s.Key == key && removingIds.Any(i => i == s.MyModelId)).ToList();
            removingKeyMyModels.ForEach(s =>
            {
                KeyMyModels.Remove(s);
                Key.KeyMyModels.Remove(s);
            });
        }
    }

    class CacheConsumer
    {
        private readonly CacheService _cacheService = new CacheService();

        public List<RemoteModel> GetMyModels(string frontendId, string languageId, string accessId)
        {
            var key = $"{frontendId}_{languageId}_{accessId}";
            return _cacheService.Get(key, () =>
            {
                // call to remote service here
                return new List<RemoteModel>();
            });
        }
    }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM