简体   繁体   中英

Using C# Dictionary to parse log file

I am trying to parse a rather long log file and creating a better more manageable listing of issues.

I am able to read and parse out the individual log line by line, but what I need to do is display only unique entries, as some errors occur more often than others and are always recorded with identical text.

What I was going to try to do was create a Dictionary object to hold each unique entry and as I work through the log file, search the Dictionary object to see if the same values are already in there.

Here is a crude sample of the code I have (a work in progress, I hope I have all syntax right) that does not work. For some reason this script never sees any distinct entries (if statement never passes):

    string[] rowdta = new string[4];
    Dictionary<string[], int> dict = new Dictionary<string[], int>();
    int ctr = -1;
    if (linectr == 1)
        {
            ctr++;
            dict.Add(rowdta, ctr);
        }
        else
        {
            foreach (KeyValuePair<string[], int> pair in dict)
            {
                if ((pair.Key[1] != rowdta[1]) || (pair.Key[2] != rowdta[2])| (pair.Key[3] != rowdta[3]))
                {
                    ctr++;
                    dict.Add(rowdta, ctr);
                }
            }
        }

Some sample data: First line

    rowdta[0]="ErrorType";
    rowdta[1]="Undefined offset: 0";
    rowdta[2]="/url/routesDisplay2.svc.php";
    rowdta[3]="Line Number 5";

2nd line

    rowdta[0]="ErrorType";
    rowdta[1]="Undefined offset: 0";
    rowdta[2]="/url/routesDisplay2.svc.php";
    rowdta[3]="Line Number 5";

3rd line

    rowdta[0]="ErrorType";
    rowdta[1]="Undefined variable: fvmsg";
    rowdta[2]="/url/processes.svc.php";
    rowdta[3]="Line Number 787";

So, with this, the Dictionary will have 2 items in it, first line and 3rd line.

I have also tried this with the following which nalso does not find any variations in the log file text.

    if (!dict.ContainsKey(rowdta)) {}

Can someone please help me get this syntax right? I am just a newbie at C# but this should be relatively straightforward. As always, I am thinking that this should be enough information to get the conversation started. If you want/need more detail, please let me know.

The reason that you see the problem is that an array of strings cannot be used as a key in a dictionary without supplying a custom IEqualityComparer<string[]> or writing a wrapper around it.

EDIT Here is a quick and dirty implementation of a custom comparer:

private class ArrayEq<T> : IEqualityComparer<T[]> {
    public bool Equals(T[] x, T[] y) {
        return x.SequenceEqual(y);
    }
    public int GetHashCode(T[] obj) {
        return obj.Sum(o => o.GetHashCode());
    }
}

Here is how you can use it:

var dd = new Dictionary<string[], int>(new ArrayEq<string>());
dd[new[] { "a", "b" }] = 0;
dd[new[] { "a", "b" }]++;
dd[new[] { "a", "b" }]++;
Console.WriteLine(dd[new[] { "a", "b" }]);

Either create a wrapper for your strings which implements IEquatable .

public class LogFileEntry :IEquatable<LogFileEntry>
{
    private readonly string[] _rows;

    public LogFileEntry(string[] rows)
    {
        _rows = rows;
    }

    public override int GetHashCode()
    {
        return 
            _rows[0].GetHashCode() << 3 | 
            _rows[2].GetHashCode() << 2 | 
            _rows[1].GetHashCode() << 1 | 
            _rows[0].GetHashCode();
    }

    #region Implementation of IEquatable<LogFileEntry>

    public override bool Equals(Object obj)
    {
        if (obj == null) 
            return base.Equals(obj);

        return Equals(obj as LogFileEntry);  
    } 

    public bool Equals(LogFileEntry other)
    {
        if(other == null) 
            return false;

        return _rows.SequenceEqual(other._rows);
    }

    #endregion
}

Then use that in your dictionary:

var d = new Dictionary<LogFileEntry, int>();

var entry = new LogFileEntry(rows);
if( d.ContainsKey(entry) )
{
    d[entry] ++;
} 
else
{
    d[entry] = 1;
}

Or create a custom comparer similar to that proposed by @dasblinkenlight and use as follows

public class LogFileEntry 
{
}

public class LogFileEntryComparer : IEqualityComparer<LogFileEntry>{ ... }

var d = new Dictionary<LogFileEntry, int>(new LogFileEntryComparer());

var entry = new LogFileEntry(rows);
if( d.ContainsKey(entry) )
{
    d[entry] ++;
} 
else
{
    d[entry] = 1;
}

The problem is that array equality is reference equality. In other words, it does not depend on the values stored in the array, it depends only on the identity of the array.

Some solutions

  • use Tuple to hold the row data
  • use an anonymous type to hold the row data
  • create a custom type to hold the row data, and, if it is a class, override Equals and GetHashCode.
  • create a custom implementation of IEqualityComparer to compare the arrays according to their values, and pass that to the dictionary when you create it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM