简体   繁体   中英

Comparable merge of data design pattern

Is there a pattern or best practice for comparing data in the following scenario:

Each letter represents a chunk of data, in my case XML.

a+b+c+d

These are merged into one and returned. If I beforehand had merged a+b+c, then it would be pretty simple to identify this "package" and then add d. But what if I had cached

a+c+d

and then the request for a+b+c+d came, what would be the best way to run through all these possible combinations to determine that adding b to the a+c+d package would given the desired result?

The order in which the data is merged is unimportant. And although it probably won't have any effect on the answer, the code is written in C# 4.0.

Edit one more example:

Possible elements: a,b,c,d,e,f

Lets say I get a request for: a + c + d + e meaning an array with 0=a,1=c,2=d,3=e

In my "cache" I have the following: c + d + e already merged

Then upon request I would have to find a way to do something like:

if(cache.Contains(request.elements[0]+request.elements[1] etc...))
else(cache.Contains(request.elements[1] + request.elements[2] etc...))

It probably needs to be some sort of recursive for loop, but as the possible elements in my case ends up in the 2-5000 range it needs to be as fast and efficient as possible.

According to this:

"and then the request for a+b+c+d came, what would be the best way to run through all these possible combinations to determine that adding b to the a+c+d package would given the desired result?"

I am assuming the order does not matter, so it is possible to merge "b" with "acd" if you want "abcd". The only thing that matters is which elements are included.

Now, I have no idea what you are using for XML or how you are merging it, So I wrote this with merging strings, and merging by simply concatenating them. You will have to rewrite the Merge methods to do whatever it is you want to do (and change string everywhere to whatever you are using). I also used integers instead of a, b, c because I assume you will have a lot more of those than there are letters in the alphabet.

Also, when for example you are looking for a + b + c + d + e + f + g , and the best match in the cache is c + e + g + f , then it will also look in the cache for the best match for the remainder, a + b + d , and so on, in order to further reduce the number of merges. If you don't want this (if with your xml, you can't merge a + b with c + d into a + b + c + d ), you can easily rewrite it without this but it will do more merges on average.

This should be very fast. Look at the comments in the main function to see what it does.

using System;
using System.Collections.Generic;
using System.Text;

namespace ConsoleApplication17
{
    class CachedMerger
    {
        private Dictionary<HashSet<int>, string> _cache = new Dictionary<HashSet<int>, string>();
        private Dictionary<int, string> _items = new Dictionary<int, string>();

        public void AddItem(int index, string item)
        {
            _items[index] = item;
        }

        public void RemoveItem(int index)
        {
            _items.Remove(index);
        }

        private string Merge(string a, string b)
        {
            return a + b;
        }

        private string Merge(HashSet<int> list)
        {
            var sb = new StringBuilder();
            foreach (var index in list)
            {
                if (!_items.ContainsKey(index))
                    return null;
                else
                    sb.Append(_items[index]);
            }

            return sb.ToString();         
        }

        public string Get(HashSet<int> query)
        {
            var bestMatchKey = BestMatchKey(query);
            if (bestMatchKey == null)
            {
                var result = Merge(query);

                if (result == null)
                    throw new Exception("Requested item not found in the item list.");

                _cache[query] = result;
                return result;
            }
            else
            {
                if (bestMatchKey.Count == query.Count)
                    return _cache[bestMatchKey];

                var missing = new HashSet<int>();
                foreach (var index in query)
                    if (!bestMatchKey.Contains(index))
                        missing.Add(index);

                return Merge(_cache[bestMatchKey], Get(missing));
            }
        }

        private HashSet<int> BestMatchKey(HashSet<int> set)
        {
            int bestCount = 0;
            HashSet<int> bestKey = null;
            foreach (var entry in _cache)
            {
                var key = entry.Key;
                int count = 0;
                bool fail = false;
                foreach (var i in key)
                {
                    if (set.Contains(i))
                    {
                        count++;
                    }
                    else
                    {
                        fail = true;
                        break;
                    }
                }

                if (!fail && count > bestCount)
                {
                    bestKey = key;
                    bestCount = count;
                }
            }
            return bestKey;
        }
    }

    class Program
    {
        static void Main(string[] args)
        {
            var cm = new CachedMerger();
            // Add all the base parts
            cm.AddItem(0, "sjkdlajkld");
            cm.AddItem(1, "dffdfdfdf");
            cm.AddItem(2, "qwqwqw");
            cm.AddItem(3, "yuyuyuyy");
            cm.AddItem(4, "kjkjkjkjkj");
            cm.AddItem(5, "oioyuyiyui");

            // This will merge 0 + 1 + 3 + 4 since the cache is empty
            Console.WriteLine(cm.Get(new HashSet<int> { 0, 1, 3, 4 }));
            // This will merge 2 + 5 as there is no match in the cache
            Console.WriteLine(cm.Get(new HashSet<int> { 2, 5 }));
            // This will merge (2 + 5) from the cache with 3
            Console.WriteLine(cm.Get(new HashSet<int> { 2, 3, 5 }));
            // This will merge (0 + 1 + 3 + 4) from the cache with (2 + 5) from the cache
            Console.WriteLine(cm.Get(new HashSet<int> { 0, 1, 2, 3, 4, 5 }));

            Console.Read();
        }
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM