简体   繁体   English

使用C#Dictionary解析日志文件

[英]Using C# Dictionary to parse log file

I am trying to parse a rather long log file and creating a better more manageable listing of issues. 我正在尝试解析一个相当长的日志文件,并创建一个更易于管理的问题列表。

I am able to read and parse out the individual log line by line, but what I need to do is display only unique entries, as some errors occur more often than others and are always recorded with identical text. 我能够逐行读取和解析单个日志,但我需要做的是只显示唯一条目,因为有些错误比其他错误更频繁地出现,并且总是用相同的文本记录。

What I was going to try to do was create a Dictionary object to hold each unique entry and as I work through the log file, search the Dictionary object to see if the same values are already in there. 我要尝试做的是创建一个Dictionary对象来保存每个唯一的条目,当我处理日志文件时,搜索Dictionary对象以查看是否已经存在相同的值。

Here is a crude sample of the code I have (a work in progress, I hope I have all syntax right) that does not work. 这是我所拥有的代码的原始样本(正在进行的工作,我希望我的所有语法都正确),但这些代码不起作用。 For some reason this script never sees any distinct entries (if statement never passes): 由于某种原因,此脚本永远不会看到任何不同的条目(如果语句从未通过):

    string[] rowdta = new string[4];
    Dictionary<string[], int> dict = new Dictionary<string[], int>();
    int ctr = -1;
    if (linectr == 1)
        {
            ctr++;
            dict.Add(rowdta, ctr);
        }
        else
        {
            foreach (KeyValuePair<string[], int> pair in dict)
            {
                if ((pair.Key[1] != rowdta[1]) || (pair.Key[2] != rowdta[2])| (pair.Key[3] != rowdta[3]))
                {
                    ctr++;
                    dict.Add(rowdta, ctr);
                }
            }
        }

Some sample data: First line 一些样本数据:第一行

    rowdta[0]="ErrorType";
    rowdta[1]="Undefined offset: 0";
    rowdta[2]="/url/routesDisplay2.svc.php";
    rowdta[3]="Line Number 5";

2nd line 第二行

    rowdta[0]="ErrorType";
    rowdta[1]="Undefined offset: 0";
    rowdta[2]="/url/routesDisplay2.svc.php";
    rowdta[3]="Line Number 5";

3rd line 第3行

    rowdta[0]="ErrorType";
    rowdta[1]="Undefined variable: fvmsg";
    rowdta[2]="/url/processes.svc.php";
    rowdta[3]="Line Number 787";

So, with this, the Dictionary will have 2 items in it, first line and 3rd line. 因此,有了这个,字典中将包含2个项目,第一行和第三行。

I have also tried this with the following which nalso does not find any variations in the log file text. 我也尝试过以下内容,nalso在日志文件文本中找不到任何变化。

    if (!dict.ContainsKey(rowdta)) {}

Can someone please help me get this syntax right? 有人可以帮我解决这个语法吗? I am just a newbie at C# but this should be relatively straightforward. 我只是C#的新手,但这应该是相对简单的。 As always, I am thinking that this should be enough information to get the conversation started. 和往常一样,我认为这应该是足够的信息来开始对话。 If you want/need more detail, please let me know. 如果您需要/需要更多细节,请告诉我。

The reason that you see the problem is that an array of strings cannot be used as a key in a dictionary without supplying a custom IEqualityComparer<string[]> or writing a wrapper around it. 您看到问题的原因是字符串数组不能用作字典中的键而不提供自定义IEqualityComparer<string[]>或在其周围编写包装器。

EDIT Here is a quick and dirty implementation of a custom comparer: 编辑这是一个快速而又脏的自定义比较器实现:

private class ArrayEq<T> : IEqualityComparer<T[]> {
    public bool Equals(T[] x, T[] y) {
        return x.SequenceEqual(y);
    }
    public int GetHashCode(T[] obj) {
        return obj.Sum(o => o.GetHashCode());
    }
}

Here is how you can use it: 以下是如何使用它:

var dd = new Dictionary<string[], int>(new ArrayEq<string>());
dd[new[] { "a", "b" }] = 0;
dd[new[] { "a", "b" }]++;
dd[new[] { "a", "b" }]++;
Console.WriteLine(dd[new[] { "a", "b" }]);

Either create a wrapper for your strings which implements IEquatable . 为字符串创建一个实现IEquatable的包装器。

public class LogFileEntry :IEquatable<LogFileEntry>
{
    private readonly string[] _rows;

    public LogFileEntry(string[] rows)
    {
        _rows = rows;
    }

    public override int GetHashCode()
    {
        return 
            _rows[0].GetHashCode() << 3 | 
            _rows[2].GetHashCode() << 2 | 
            _rows[1].GetHashCode() << 1 | 
            _rows[0].GetHashCode();
    }

    #region Implementation of IEquatable<LogFileEntry>

    public override bool Equals(Object obj)
    {
        if (obj == null) 
            return base.Equals(obj);

        return Equals(obj as LogFileEntry);  
    } 

    public bool Equals(LogFileEntry other)
    {
        if(other == null) 
            return false;

        return _rows.SequenceEqual(other._rows);
    }

    #endregion
}

Then use that in your dictionary: 然后在你的字典中使用它:

var d = new Dictionary<LogFileEntry, int>();

var entry = new LogFileEntry(rows);
if( d.ContainsKey(entry) )
{
    d[entry] ++;
} 
else
{
    d[entry] = 1;
}

Or create a custom comparer similar to that proposed by @dasblinkenlight and use as follows 或者创建一个类似于@dasblinkenlight提出的自定义比较器,并按如下方式使用

public class LogFileEntry 
{
}

public class LogFileEntryComparer : IEqualityComparer<LogFileEntry>{ ... }

var d = new Dictionary<LogFileEntry, int>(new LogFileEntryComparer());

var entry = new LogFileEntry(rows);
if( d.ContainsKey(entry) )
{
    d[entry] ++;
} 
else
{
    d[entry] = 1;
}

The problem is that array equality is reference equality. 问题是数组相等是引用相等。 In other words, it does not depend on the values stored in the array, it depends only on the identity of the array. 换句话说,它不依赖于存储在数组中的值,它仅取决于数组的标识。

Some solutions 一些解决方案

  • use Tuple to hold the row data 使用Tuple来保存行数据
  • use an anonymous type to hold the row data 使用匿名类型来保存行数据
  • create a custom type to hold the row data, and, if it is a class, override Equals and GetHashCode. 创建一个自定义类型来保存行数据,如果是类,则重写Equals和GetHashCode。
  • create a custom implementation of IEqualityComparer to compare the arrays according to their values, and pass that to the dictionary when you create it. 创建IEqualityComparer的自定义实现,以根据数组的值比较数组,并在创建时将其传递给字典。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM