简体   繁体   中英

Finding differences in two lists

I am thinking about a good way to find differences in two lists

here is the problem:

Two lists have some strings where first 3 numbers/characters (*delimited) represent the unique key(followed by the text String="key1*key2*key3*text").

here is the string example:

AA1*1D*4*The quick brown fox*****CC*3456321234543~

where "*AA1*1D*4*" is a unique key

List1: "index1*index2*index3", "index2*index2*index3", "index3*index2*index3"

List2: "index2*index2*index3", "index1*index2*index3", "index3*index2*index3", "index4*index2*index3"

I need to match indexes in both lists and compare them.

  1. If all 3 indexes from 1 list match 3 indexes from another list, I need to track both string entries in the new list

  2. If there is a set of indexes in one list that don't appear in another, I need to track one side and keep an empty entry in another side. (#4 in the example above)

return the list

This is what I did so far, but I am kind of struggling here:

        List<String> Base = baseListCopy.Except(resultListCopy, StringComparer.InvariantCultureIgnoreCase).ToList(); //Keep unique values(keep differences in lists)
        List<String> Result = resultListCopy.Except(baseListCopy, StringComparer.InvariantCultureIgnoreCase).ToList(); //Keep unique values (keep differences in lists)

        List<String[]> blocksComparison = new List<String[]>(); //we container for non-matching blocks; so we could output them later

        //if both reports have same amount of blocks
        if ((Result.Count > 0 || Base.Count > 0) && (Result.Count == Base.Count))
        {
            foreach (String S in Result)
            {
                String[] sArr = S.Split('*');
                foreach (String B in Base)
                {
                    String[] bArr = B.Split('*');

                    if (sArr[0].Equals(bArr[0]) && sArr[1].Equals(bArr[1]) && sArr[2].Equals(bArr[2]) && sArr[3].Equals(bArr[3]))
                    {
                        String[] NA = new String[2]; //keep results
                        NA[0] = B; //[0] for base
                        NA[1] = S; //[1] for result
                        blocksComparison.Add(NA);
                        break;
                    }
                }
            }
        }

could you suggest a good algorithm for this process?

Thank you

You can use a HashSet.

Create a HashSet for List1. remember index1*index2*index3 is diffrent from index3*index2*index1.

Now iterate through second list.

Create Hashset for List1.

foreach(string in list2)
{
    if(hashset contains string)
       //Add it to the new list.
}
List one = new List();
List two = new List();
List three = new List();
HashMap<String,Integer> intersect = new HashMap<String,Integer>();

for(one: String index)
{
    intersect.put(index.next,intersect.get(index.next) + 1);
}

for(two: String index)
{
    if(intersect.containsKey(index.next))
    {
        three.add(index.next);
    }
}

If I understand your question correctly, you'd like to be able to compare the elements by their "key" prefix, instead by the whole string content. If so, implementing a custom equality comparer will allow you to easily leverage the LINQ set algorithms.

This program...

class EqCmp : IEqualityComparer<string> {

    public bool Equals(string x, string y) {
        return GetKey(x).SequenceEqual(GetKey(y));
    }

    public int GetHashCode(string obj) {
        // Using Sum could cause OverflowException.
        return GetKey(obj).Aggregate(0, (sum, subkey) => sum + subkey.GetHashCode());
    }

    static IEnumerable<string> GetKey(string line) {
        // If we just split to 3 strings, the last one could exceed the key, so we split to 4.
        // This is not the most efficient way, but is simple.
        return line.Split(new[] { '*' }, 4).Take(3);
    }

}

class Program {

    static void Main(string[] args) {

        var l1 = new List<string> {
            "index1*index1*index1*some text",
            "index1*index1*index2*some text ** test test test",
            "index1*index2*index1*some text",
            "index1*index2*index2*some text",
            "index2*index1*index1*some text"
        };

        var l2 = new List<string> {
            "index1*index1*index2*some text ** test test test",
            "index2*index1*index1*some text",
            "index2*index1*index2*some text"
        };

        var eq = new EqCmp();

        Console.WriteLine("Elements that are both in l1 and l2:");
        foreach (var line in l1.Intersect(l2, eq))
            Console.WriteLine(line);

        Console.WriteLine("\nElements that are in l1 but not in l2:");
        foreach (var line in l1.Except(l2, eq))
            Console.WriteLine(line);

        // Etc...

    }

}

...prints the following result:

Elements that are both in l1 and l2:
index1*index1*index2*some text ** test test test
index2*index1*index1*some text

Elements that are in l1 but not in l2:
index1*index1*index1*some text
index1*index2*index1*some text
index1*index2*index2*some text

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM