简体   繁体   English

C#OutofMemoryException与Regex

[英]C# OutofMemoryException with Regex

I'm getting a OutOfMemoryException at 我在收到OutOfMemoryException

if (Regex.IsMatch(output, @"^\\d"))

But I'm unsure of what's causing it, my program had been running for like 4 minute. 但是我不确定是什么原因,我的程序已经运行了大约4分钟。 Reading text files (a lot of them). 读取文本文件(很多)。 Bulk inserting them into SQL. 将它们批量插入SQL。 The output string at the time contained nothing special, a small text read from a .txt file. 当时的输出字符串没有什么特别的,是从.txt文件读取的一小段文本。

I'm assuming this is happening because of the amount of times it needs to regex check, after 4 minute it was in the million times. 我假设这是由于需要进行正则表达式检查的次数而发生的,在4分钟后它达到了百万次。 Is there a way to prevent the Memory problem? 有没有办法防止内存问题? dispose or clear before I start looping? 开始循环之前要处理还是清除? If so how do you that? 如果是这样,那你怎么办?

EDIT: I'm not reading a big file, I'm reading a lot of files. 编辑:我没有读一个大文件,我正在读很多文件。 At the time it failed it was around 6666~ files it already read (5 folders) but it needs to read 60 folders in total -> 80.361 .txt files 失败时,它已经读取了大约6666个文件(5个文件夹),但总共需要读取60个文件夹-> 80.361 .txt文件

EDIT: Added the source code. 编辑:添加了源代码。 Hoping to clarify 希望澄清

UPDATE: 更新:

added: static void DisposeAll(IEnumerable set) 添加:静态void DisposeAll(IEnumerable集)

static void DisposeAll(IEnumerable set)
{
    foreach (Object obj in set)
    {
        IDisposable disp = obj as IDisposable;
        if (disp != null) { disp.Dispose(); }
    }
}

And I'm executing this at the end of each loop of a folder. 我正在文件夹的每个循环的末尾执行此操作。

DisposeAll(ListExtraInfo);
DisposeAll(ListFouten);
ListFouten.Clear();
ListExtraInfo.Clear();

Error placement changed, no longer the Regex but ListFouten is causing it now. 错误位置已更改,不再是正则表达式,而是ListFouten现在引起了它。 Still happening at around 6666 .txt files read. 仍然发生在读取大约6666 .txt文件的情况。

Exception of type 'System.OutOfMemoryException' was thrown. 引发了类型为'System.OutOfMemoryException'的异常。

static void Main(string[] args)
        {
            string pathMMAP = @"G:\HLE13\Resultaten\MMAP";
            string[] entriesMMAP = Directory.GetDirectories(pathMMAP);
            List<string> treinNamen = new List<string>();

            foreach (string path in entriesMMAP)
            {
                string TreinNaam = new DirectoryInfo(path).Name;
                treinNamen.Add(TreinNaam);
                int IdTrein = 0;
                ListExtraInfo = new List<extraInfo>();
                ListFouten = new List<fouten>();
                readData(TreinNaam, IdTrein, path);
             }
        }


        static void readData(string TreinNaam, int IdTrein, string path)
        {
            using (SqlConnection sourceConnection = new SqlConnection(GetConnectionString()))
            {
                sourceConnection.Open();


                try
                {
                    SqlCommand commandRowCount = new SqlCommand(
                 "SELECT TreinId FROM TestDatabase.dbo.Treinen where Name = " + TreinNaam,
                 sourceConnection);
                    IdTrein = Convert.ToInt16(commandRowCount.ExecuteScalar());

                }
                catch (Exception ex)
                {


                }

            }

            string[] entriesTreinen = Directory.GetDirectories(path);
            foreach (string rapport in entriesTreinen)
            {

                string RapportNaam = new DirectoryInfo(rapport).Name;
                FileInfo fileData = new System.IO.FileInfo(rapport);

                leesTxt(rapport, TreinNaam, GetConnectionString(), IdTrein);

            }
        }
        public static string datum;
        public static string tijd;
        public static string foutcode;
        public static string absentOfPresent;
        public static string teller;
        public static string omschrijving;
        public static List<fouten> ListFouten;
        public static List<extraInfo> ListExtraInfo;
        public static string textname;
        public static int referentieGetal = 0;


        static void leesTxt(string rapport, string TreinNaam, string myConnection, int TreinId)
        {
            foreach (string textFilePath in Directory.EnumerateFiles(rapport, "*.txt"))
            {

                textname = Path.GetFileName(textFilePath);
                textname = textname.Substring(0, textname.Length - 4);

                using (StreamReader r = new StreamReader(textFilePath))
                {
                    for (int x = 0; x <= 10; x++)
                        r.ReadLine();

                    string output;

                    Regex compiledRegex = new Regex(@"^\d", RegexOptions.Compiled);
                    string[] info = new string[] { };
                    string[] datumTijdelijk = new string[] { };

                    while (true)
                    {

                        output = r.ReadLine();
                        if (output == null)
                            break;


                        if (compiledRegex.IsMatch(output))
                        {
                            info = output.Split(' ');
                            int kolom = 6;
                            datum = info[0];
                            datumTijdelijk = datum.Split(new[] { '/' });


                            try
                            {
                                datum = string.Format("{2}/{1}/{0}", datumTijdelijk);
                                tijd = info[1];
                                foutcode = info[2];
                                absentOfPresent = info[4];
                                teller = info[5];
                                omschrijving = info[6];
                            }
                            catch (Exception ex)
                            {

                            }


                            while (kolom < info.Count() - 1)
                            {
                                kolom++;
                                omschrijving = omschrijving + " " + info[kolom];
                            }
                            referentieGetal++;


                            ListFouten.Add(new fouten { Date = datum, Time = tijd, Description = omschrijving, ErrorCode = foutcode, Module = textname, Name = TreinNaam, TreinId = TreinId, FoutId = referentieGetal });

                        }


                        if (output == string.Empty)
                        {
                            output = " ";
                        }
                        if (Char.IsLetter(output[0]))
                        {
                            ListExtraInfo.Add(new extraInfo { Values = output, FoutId = referentieGetal });
                        }

                    }

                }

            }

        }

It could be because your code is re-compiling the regular expression every time it is used? 可能是因为您的代码每次使用时都会重新编译正则表达式吗? Try using a compiled Regex transform instead. 尝试改用已编译的Regex转换 Outside your foreach loop, store a compiled Regex variable: foreach循环之外,存储已编译的Regex变量:

Regex compiledRegex = new Regex(@"^\d", RegexOptions.Compiled);

Then, when checking for the match, use: 然后,在检查匹配项时,请使用:

if (compiledRegex.IsMatch(output))

Edit: this answer is not valid. 编辑:此答案无效。 Though the Regex documentation here states that Regex expressions encountered in instance methods would be recompiled, this is not the case: they are cached. 尽管这里的Regex文档指出实例方法中遇到的Regex表达式将被重新编译,但事实并非如此:它们被缓存了。

This issue is not for the fault of the regex operations, for the true fault lies in the data which is ultimately being stored around the regex processing . 这个问题不是正则表达式操作的错误,因为真正的错误在于最终在正则表达式处理周围存储的数据。

The analogy is driving a car and saying "It ran out of gas while I had the radio on". 打个比方是开汽车,然后说:“当我打开收音机时,它的汽油用完了”。 It is not the radio's fault... 这不是收音机的错...

I recommend that you identify why such copious amounts of data are being stored and resolve that. 我建议您确定为什么要存储如此大量的数据并加以解决。


There are better ways of processing and analyzing information than throwing everything in memory. 有比将所有内容都扔到内存中更好的处理和分析信息的方法。 I believe that you will need to rewrite the logic to achieve the end goal. 我相信您将需要重写逻辑以实现最终目标。

Why are you collecting, and more importantly saving information about every line of 6000+ files? 为什么要收集,更重要的是保存有关6000多个文件的每一行的信息? That might be the real issue here.... 这可能是这里的真正问题。


Otherwise be proactive with these steps 否则,请主动执行以下步骤

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM