简体   繁体   中英

How do I reduce the memory footprint with large datasets in EF5?

I'm trying to pull a large-ish dataset (1.4 million records) from a SQL Server and dump to a file in a WinForms application. I've attempted to do it with paging, so that I'm not holding too much in memory at once, but the process continues to grow it's memory footprint as it runs. About 25% through, it was taking up 600,000K. Am I doing the paging wrong? Can I get some suggestions on how to keep the memory usage from growing so much?

var query = (from organizations in ctxObj.Organizations
                 where organizations.org_type_cd == 1
                 orderby organizations.org_ID
                 select organizations);
int recordCount = query.Count();
int skipTo = 0;
int take = 1000;
if (recordCount > 0)
{
    while (skipTo < recordCount)
    {
        if (skipTo + take > recordCount) 
            take = recordCount - skipTo;

        foreach (Organization o in query.Skip(skipTo).Take(take))
        {
            writeRecord(o);
        }
        skipTo += take;
    }
}

The object context will keep on objects in memory until it's disposed. I would recommend disposing the context after each batch to prevent the memory footprint from continuing to grow.

You can also use AsNoTracking() ( http://msdn.microsoft.com/en-us/library/gg679352(v=vs.103).aspx ) since you are not saving back to the database.

Get rid of paging and use AsNoTracking .

Test Code

 static void Main(string[] args)
        {
            var sw = new Stopwatch();
            sw.Start();
            using (var context = new MyEntities())
            {
                var query = (from organizations in context.LargeSampleTable.AsNoTracking()
                             where organizations.ErrorID != null
                             orderby organizations.ErrorID
                             select organizations);//large sample table, 146994 rows

                foreach (MyObject o in query)
                {
                    writeRecord(o);
                }

            }
            sw.Stop();

            Console.WriteLine("Completed after: {0}", sw.Elapsed);
            Console.ReadLine();
        }

        private static void writeRecord(ApplicationErrorLog o)
        {
            ;
        }

Test Case Result:

Memory Consumption reduced: 96%
Execution Time reduced: 50%

Interpretation

AsNoTracking provides benefits to memory usage for obvious reasons, we don't have to maintain references to the entities as we load them into memory. Objects are GC elegible almost immediately. Combine lazy evaluation and AsNoTracking and there is no need for paging and context destruction can be deferred.

While this is a single test the large number of rows and exclusion of most external factors make this a good representation for the general case.

A few things.

  1. Calling Count() runs your query. You then run it a second time to get the results. You don't need to do this.

  2. The memory you're seeing is due to loading entities into memory. If you only need a subset of fields, project to an anonymous type (or a simpler named type.) This will avoid any change tracking and overhead.

Used in this way, EF can be a nice strongly typed API to lightweight SQL queries.

Something like this should do the trick:

var query = from organizations in ctxObj.Organizations
             where organizations.org_type_cd == 1
             orderby organizations.org_ID
             select new { o.Id, o.Name };

foreach (var org in query)
{
    write(org.Id, org.Name);
}

Why don't you just use a standard System.Data.SqlClient.SqlConnection class? You can read the results of a command line by line using the SqlDataReader class and write each line to a file. You have full control to guarantee that your code is only referencing one line of records at a time.

using (var writer = new System.IO.StreamWriter(fileName))
using (var conn = new SqlConnection(connectionString))
{
    using (var cmd = new SqlCommand())
    {
        cmd.CommandText = "SELECT * FROM Organizations WHERE org_type_cd = 1 ORDER BY org_ID";

        using (var reader = cmd.ExecuteReader())
        {
            while (reader.Read())
            {
                int id = (int)reader["org_ID"];
                int org_type_cd = (int)reader["org_type_cd"];

                writer.WriteLine(...);
            }
        }
    }
}

Entity Framework isn't meant to solve every problem or to be your exclusive data access framework. It's meant to things easier to write for simple CRUD operations. Dealing with millions of rows is a good use case for a more specialized solution.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM