简体   繁体   English

使用C#读取大型文本文件时出现System.OutofMemoryException

[英]System.OutofMemoryException while reading a large text file using C#

I have code that reads a text file and populate a .Net datatable. 我有读取文本文件并填充.Net数据表的代码。 The code works fine when it read a smaller size of text file that has 100,000 lines of data. 当读取具有100,000行数据的较小尺寸的文本文件时,该代码可以正常工作。 (see snippet below) When I try to read a larger text file size like 200MB and has 3.6 millions line of data throws me an exception of System.OutofMemoryException. (请参阅下面的代码段)当我尝试读取较大的文本文件大小(如200MB)并具有360万行数据时,抛出System.OutofMemoryException异常。 Would like to ask an efficient way of reading a large data into a certain chunks. 想问一个将大数据读入某些块的有效方法。

        using (var stream = File.Open(filePath, FileMode.Open))
        {
            var content = new StreamContent(stream);
            var fileStream = content.ReadAsStreamAsync().Result;

            if (fileStream == null) throw new ArgumentException(Constants.FileEmptyErrorMessage);

            using (var bs = new BufferedStream(fileStream))
            {
                using (var reader = new StreamReader(bs, Encoding.GetEncoding(Constants.IsoEncoding)))
                {


                    while (!reader.EndOfStream)
                    {
                        var line = reader.ReadLine();
                        if (!String.IsNullOrEmpty(line))
                        {
                            string[] rows = line.Trim().Split(new char[] { ';' }, StringSplitOptions.None);

                            DataRow dr = Table.NewRow();
                            dr[Constants.Percepcion] = rows[0];
                            dr[Constants.StartDate] = DateTime.ParseExact(rows[2].ToString(), "ddMMyyyy",
                                CultureInfo.InvariantCulture);
                            dr[Constants.EndDate] = DateTime.ParseExact(rows[3].ToString(), "ddMMyyyy",
                                CultureInfo.InvariantCulture);
                            dr[Constants.CID] = rows[4];
                            dr[Constants.Rate] = rows[8];

                            Table.Rows.Add(dr);
                        }
                    }
                }
            }
        }

I can see that the memory leak is not because the reading the whole file as you already read line by line var line = reader.ReadLine(); 我可以看到内存泄漏不是因为您已经逐行读取整个文件var line = reader.ReadLine(); . I think the leak is because the size of the datatable Table as it contains all the data of the whole file. 我认为泄漏是因为数据Table的大小,因为它包含整个文件的所有数据。
I suggest one of these options: 我建议这些选项之一:
1. If you are performing aggregation functions on the rows of the datatable, just do them (like setting an integer counter, or double max_columnX) without keeping the whole rows. 1.如果要在数据表的行上执行聚合函数,则只需执行它们(如设置整数计数器或double max_columnX),而不必保留整个行。
2. If you really need to keep all the rows. 2.如果您确实需要保留所有行。 Create a database (MSSQL/MYSQL/ or any) and read the file line by line - as you do - and insert these data to the database. 创建一个数据库(MSSQL / MYSQL /或任何数据库)并逐行读取文件(与您一样),然后将这些数据插入数据库中。 Then query the database with your criteria. 然后使用您的条件查询数据库。
3. You can bulk insert the whole file to a database without the need to process it through your C# application. 3.您可以将整个文件批量插入数据库中,而无需通过C#应用程序进行处理。 Here is a SQL SERVER example : 这是一个SQL SERVER 示例

BULK INSERT AdventureWorks2012.Sales.SalesOrderDetail
   FROM 'f:\orders\lineitem.tbl'
   WITH
     (
        FIELDTERMINATOR =';',
        ROWTERMINATOR = '\n',
        FIRE_TRIGGERS
      );

Edit: You can attach a memory profiler to find what exactly takes the large memory and add it to the question. 编辑:您可以附加一个内存探查器,以查找究竟占用了大内存的内容并将其添加到问题中。 It will help getting better answers. 这将有助于获得更好的答案。

If you alter the default buffer size of your BufferedStream, then it should load the larger files for you with greater efficiency. 如果您更改BufferedStream的默认缓冲区大小,则它将为您更高效率地加载较大的文件。 Eg 例如

using (var bs = new BufferedStream(fileStream, 1024))
{
    // Code here.
}

You may be able to get away with simply using a FileStream, specifying a buffer size also, rather than a BufferedStream. 您也许可以简单地使用FileStream而不指定BufferedStream来指定缓冲区大小,从而摆脱困境。 See this MSDN blog regarding it for further details. 有关更多详细信息,请参见此MSDN博客

Here's what I did to read a big text file. 这是我阅读大文本文件的工作。 No need to use buffered steam. 无需使用缓冲蒸汽。

var filteredTextFileData = (from textFileData in File.ReadAllLines(_filePathList[0]).Skip(1).Where(line => !string.IsNullOrEmpty(line))
                    let textline = textFileData.Split(';')
                    let startDate = DateTime.ParseExact(textline[2].ToString(), Constants.DayMonthYearFormat, CultureInfo.InvariantCulture)
                    let endDate = !string.IsNullOrEmpty(textline[3]) ? DateTime.ParseExact(textline[3], Constants.DayMonthYearFormat, CultureInfo.InvariantCulture) : (DateTime?)null
                    let taxId = textline[0]
                    join accountList in _accounts.AsEnumerable()
                    on taxId equals accountList.Field<string>(Constants.Comments)
                    where endDate == null || endDate.Value.Year > DateTime.Now.Year || (endDate.Value.Year == DateTime.Now.Year && endDate.Value.Month >= DateTime.Now.Month)
                    select new RecordItem()
                    {
                        Type = Constants.Regular,
                        CustomerTaxId = taxId,
                        BillingAccountNumber = accountList.Field<Int64>(Constants.AccountNo).ToString(),
                        BillingAccountName = accountList.Field<string>(Constants.BillCompany),
                        StartDate = DateTime.Compare(startDate, accountList.Field<DateTime>(Constants.DateActive)) < 1 ? accountList.Field<DateTime>(Constants.DateActive) : startDate,
                        EndDate = endDate,
                        OverrideRate = 0,
                        Result = Constants.NotStarted,
                        TaxCode = _taxCode,
                        ImpliedDecimal = 4
                    }).ToList();

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 读取大文件-System.OutOfMemoryException:引发了类型为&#39;System.OutOfMemoryException&#39;的异常。 在C#中 - Reading large file - System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown. in C# 在大型数据集上使用C#的System.OutOfMemoryException - System.OutOfMemoryException using C# on a large data set C#system.outofmemoryException - C# system.outofmemoryexception 上传大文件时抛出&#39;System.OutOfMemoryException&#39; - 'System.OutOfMemoryException' was thrown while uploading large file 填充大文件时出现PDFClown System.OutOfMemoryException - PDFClown System.OutOfMemoryException while populating large file 从文本文件读取数据时抛出“System.OutOfMemoryException”类型的异常 - Exception of type 'System.OutOfMemoryException' was thrown while reading data from text file 使用大列表时出现System.OutOfMemoryException - System.OutOfMemoryException while working with large Lists 使用C#上传文件时出现&#39;system.outofmemoryexception&#39;异常 - 'system.outofmemoryexception' exception on file upload in c# 在C#Windows窗体中读取大型XML文件时出现OutofMemoryException - OutofMemoryException while reading large XML file in C# windows forms 如何在C#中解决System.OutOfMemoryException? - How to solve System.OutOfMemoryException in C#?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM