简体   繁体   中英

C# OutOfMemory Issue when dealing with large data

In our application we are generating reports using Windows Service . The data for reports is fetched from SQL Server using a stored procedure . In some scenario the result set returned contains 250,000 records (We can not help on this part and we need this data in one go as we need to do some calculations on this).

Problem

Our application is getting this data in reader and we are converting this dataset in our custom collection of custom objects. As the data is huge it is not able to store the complete data in the custom object and hence throwing out of memory. When we see the task manager for the process usage while executing the record, it goes very high and even the CPU utilization.

I am not sure what should be do in this case.

  1. Can we increase the size of the memory allocated to a single process running under CLR?
  2. Any other workarounds?

Any help would be really appreciated

  1. Why do I need all data at once : We need to do calculations on complete resultset
  2. We are using ADO.NET and transforming the data set in to our custom object (collection)
  3. Our system is 32 bit
  4. We can not page the data
  5. Can not move the computation to sql server

This stack trace might help:

Exception of type 'System.OutOfMemoryException' was thrown. Server stack trace: at System.Collections.Generic.Dictionary 2.ValueCollection.System.Collections.Generic.IEnumerable<TValue>.GetEnumerator() at System.Linq.Enumerable.WhereEnumerableIterator 1.MoveNext() at System.Collections.Generic.List 1.InsertRange(Int32 index, IEnumerable 1 collection) at System.Collections.Generic.List 1.AddRange(IEnumerable 1 collection) at MyProject.Common.Data.DataProperty.GetPropertiesForType(Type t) in C:\\Ashish-Stuff\\Projects\\HCPA\\Dev Branch\\Common\\Benefits.Common\\Data\\DataProperty.shared.cs:line 60 at MyProject.Common.Data.Extensions.GetProperties[T](T target) in C:\\Ashish-Stuff\\Projects\\HCPA\\Dev Branch\\Common\\Benefits.Common\\Data\\Extensions.shared.cs:line 30 at MyProject.Common.Data.Factories.SqlServerDataFactoryContract 1.GetData(String procedureName, IDictionary 2 parameters, Nullable 1 languageId, Nullable 1 pageNumber, Nullable`1 pageSize)

Thanks, Ashish

Could you every 1,000 rows of data, serialize your custom collection of objects to disk somewhere? Then when you return data, paginate it from those files?

More info on your use case as to why you need to pull back 2.5million rows of data would be helpful.

My first though was that computation could be made on Sql-Server side by some stored procedure. I suspect that this approach requires some Sql-Server jedi ... but anyway, have you considered such approach?

I would love to see a code sample highlighting where exactly you are getting this error from. Is it on the data pull itself (populating the reader) or is it creating the object and adding it to the custom collection (populating the collection).

I have had similar issues before, dealing with VERY LARGE datasets, but met great success with leaving it in a stream for as long as possible. streams will keep the data directly in memory and you wont ever really have anything with direct access to the entire mess until you finish building the object. Now, given that the stack trace shows the error on a "MoveNext" operation, this may not work for you. I would then say try to chunk the data, grab 10k rows at a time or something, I know that this can be done with SQL. It will make the data read take a lot longer though.

EDIT

If you read this from the database into a local stream, that you then pass around (just be careful not to close it), then you shouldn't run into these issues. Make a data wrapper class that you can pass around with an open stream and an open reader. Store the data in the stream and then use wrapper functions to read the specific data you need from it. Things like GetSumOfXField() or AverageOfYValues() , etc etc... The data will never be in a live object, but you wont have to keep going back to the database for it.

Pseudo Example

    public void ReadingTheDataFunction()
    {
        DBDataReader reader = dbCommand.ExecuteReader();
        MyDataStore.FillDataSource(reader)
    }

    private void FillDataSource(DbDataReader reader)
    {
        StreamWriter writer = new StreamWriter(GlobaldataStream);
        while (reader.Read())
            writer.WriteLine(BuildStringFromDataRow(reader));
        reader.close();
    }

    private CustomObject GetNextRow()
    {
        String line = GlobalDataReader.ReadLine();
        //Parse String to Custom Object
        return ret;
    }

From there you pass around MyDataStore, and as long as the stream and reader aren't closed, you can move your position around, go looking for individual entries, compile sums and averages, etc etc. you never even really have to know you aren't dealing with a live object, as long as you only interact with it via the interface functions.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM