使用C＃读取大型文本文件时出现System.OutofMemoryException

Question

I have code that reads a text file and populate a .Net datatable. 我有读取文本文件并填充.Net数据表的代码。 The code works fine when it read a smaller size of text file that has 100,000 lines of data. 当读取具有100,000行数据的较小尺寸的文本文件时，该代码可以正常工作。 (see snippet below) When I try to read a larger text file size like 200MB and has 3.6 millions line of data throws me an exception of System.OutofMemoryException. （请参阅下面的代码段）当我尝试读取较大的文本文件大小（如200MB）并具有360万行数据时，抛出System.OutofMemoryException异常。 Would like to ask an efficient way of reading a large data into a certain chunks. 想问一个将大数据读入某些块的有效方法。

        using (var stream = File.Open(filePath, FileMode.Open))
        {
            var content = new StreamContent(stream);
            var fileStream = content.ReadAsStreamAsync().Result;

            if (fileStream == null) throw new ArgumentException(Constants.FileEmptyErrorMessage);

            using (var bs = new BufferedStream(fileStream))
            {
                using (var reader = new StreamReader(bs, Encoding.GetEncoding(Constants.IsoEncoding)))
                {


                    while (!reader.EndOfStream)
                    {
                        var line = reader.ReadLine();
                        if (!String.IsNullOrEmpty(line))
                        {
                            string[] rows = line.Trim().Split(new char[] { ';' }, StringSplitOptions.None);

                            DataRow dr = Table.NewRow();
                            dr[Constants.Percepcion] = rows[0];
                            dr[Constants.StartDate] = DateTime.ParseExact(rows[2].ToString(), "ddMMyyyy",
                                CultureInfo.InvariantCulture);
                            dr[Constants.EndDate] = DateTime.ParseExact(rows[3].ToString(), "ddMMyyyy",
                                CultureInfo.InvariantCulture);
                            dr[Constants.CID] = rows[4];
                            dr[Constants.Rate] = rows[8];

                            Table.Rows.Add(dr);
                        }
                    }
                }
            }
        }

Answer 1

I can see that the memory leak is not because the reading the whole file as you already read line by line var line = reader.ReadLine(); 我可以看到内存泄漏不是因为您已经逐行读取整个文件var line = reader.ReadLine(); . 。 I think the leak is because the size of the datatable Table as it contains all the data of the whole file. 我认为泄漏是因为数据Table的大小，因为它包含整个文件的所有数据。
I suggest one of these options: 我建议这些选项之一：
1. If you are performing aggregation functions on the rows of the datatable, just do them (like setting an integer counter, or double max_columnX) without keeping the whole rows. 1.如果要在数据表的行上执行聚合函数，则只需执行它们（如设置整数计数器或double max_columnX），而不必保留整个行。
2. If you really need to keep all the rows. 2.如果您确实需要保留所有行。 Create a database (MSSQL/MYSQL/ or any) and read the file line by line - as you do - and insert these data to the database. 创建一个数据库（MSSQL / MYSQL /或任何数据库）并逐行读取文件（与您一样），然后将这些数据插入数据库中。 Then query the database with your criteria. 然后使用您的条件查询数据库。
3. You can bulk insert the whole file to a database without the need to process it through your C# application. 3.您可以将整个文件批量插入数据库中，而无需通过C＃应用程序进行处理。 Here is a SQL SERVER example : 这是一个SQL SERVER 示例：

BULK INSERT AdventureWorks2012.Sales.SalesOrderDetail
   FROM 'f:\orders\lineitem.tbl'
   WITH
     (
        FIELDTERMINATOR =';',
        ROWTERMINATOR = '\n',
        FIRE_TRIGGERS
      );

Edit: You can attach a memory profiler to find what exactly takes the large memory and add it to the question. 编辑：您可以附加一个内存探查器，以查找究竟占用了大内存的内容并将其添加到问题中。 It will help getting better answers. 这将有助于获得更好的答案。

Answer 2

If you alter the default buffer size of your BufferedStream, then it should load the larger files for you with greater efficiency. 如果您更改BufferedStream的默认缓冲区大小，则它将为您更高效率地加载较大的文件。 Eg 例如

using (var bs = new BufferedStream(fileStream, 1024))
{
    // Code here.
}

You may be able to get away with simply using a FileStream, specifying a buffer size also, rather than a BufferedStream. 您也许可以简单地使用FileStream而不指定BufferedStream来指定缓冲区大小，从而摆脱困境。 See this MSDN blog regarding it for further details. 有关更多详细信息，请参见此MSDN博客。

Answer 3

Here's what I did to read a big text file. 这是我阅读大文本文件的工作。 No need to use buffered steam. 无需使用缓冲蒸汽。

var filteredTextFileData = (from textFileData in File.ReadAllLines(_filePathList[0]).Skip(1).Where(line => !string.IsNullOrEmpty(line))
                    let textline = textFileData.Split(';')
                    let startDate = DateTime.ParseExact(textline[2].ToString(), Constants.DayMonthYearFormat, CultureInfo.InvariantCulture)
                    let endDate = !string.IsNullOrEmpty(textline[3]) ? DateTime.ParseExact(textline[3], Constants.DayMonthYearFormat, CultureInfo.InvariantCulture) : (DateTime?)null
                    let taxId = textline[0]
                    join accountList in _accounts.AsEnumerable()
                    on taxId equals accountList.Field<string>(Constants.Comments)
                    where endDate == null || endDate.Value.Year > DateTime.Now.Year || (endDate.Value.Year == DateTime.Now.Year && endDate.Value.Month >= DateTime.Now.Month)
                    select new RecordItem()
                    {
                        Type = Constants.Regular,
                        CustomerTaxId = taxId,
                        BillingAccountNumber = accountList.Field<Int64>(Constants.AccountNo).ToString(),
                        BillingAccountName = accountList.Field<string>(Constants.BillCompany),
                        StartDate = DateTime.Compare(startDate, accountList.Field<DateTime>(Constants.DateActive)) < 1 ? accountList.Field<DateTime>(Constants.DateActive) : startDate,
                        EndDate = endDate,
                        OverrideRate = 0,
                        Result = Constants.NotStarted,
                        TaxCode = _taxCode,
                        ImpliedDecimal = 4
                    }).ToList();

使用C＃读取大型文本文件时出现System.OutofMemoryException

问题描述

3 个解决方案

解决方案1
0 2016-04-06 20:45:35

解决方案2
0 已采纳 2016-04-06 21:35:13

解决方案3
0 2017-01-24 16:52:46

使用C＃读取大型文本文件时出现System.OutofMemoryException

问题描述

3 个解决方案

解决方案1 0 2016-04-06 20:45:35

解决方案2 0 已采纳 2016-04-06 21:35:13

解决方案3 0 2017-01-24 16:52:46

解决方案1
0 2016-04-06 20:45:35

解决方案2
0 已采纳 2016-04-06 21:35:13

解决方案3
0 2017-01-24 16:52:46