简体   繁体   中英

C# Serializing large datasets

I am trying to move data from a Microsoft SQL database into Elasticsearch. I am using EF 6 to generate the Models (Code First from database) and NEST to Serialize the Objects into Elasticsearch.

If I use Lazy loading it works fine, but unbelievably slow (so slow it can't be used). If I switch to Eager loading by adding this line:

public MyContext() : base("name=MyContext")
{
    this.Configuration.LazyLoadingEnabled = false;
}

And Serializing like this:

ElasticClient client = new ElasticClient(settings);

var allObjects = context.objects
    .Include("item1")
    .Include("item2")
    .Include("item2.item1")
    .Include("item2.item1.item");

client.IndexMany(allObjects);

I end up getting a System.OutOfMemoryException, before the Serialization takes place (so just by loading the data). I have around 2.5 GB of available Memory and we are talking about 110.000 items in the database.

I have tried Sorting the data and then use Skip and Take to only Serialize a certain amount of objects at a time, however I only managed to get 60.000 objects inserted into Elasticsearch before running out of memory. It seems like the Garbage collector does not free enough memory, even if I did call it explicitly after inserting the certain amount of objects into Elasticsearch.

Is there a way to Eager load a specific number of Objects? Or another approach to Serializing large datasets?

In hindsight, a silly mistake. By doing this, I managed to accomplish my goals:

int numberOfObjects;

using (var context = new myContext())
{
    numberOfObjects = context.objects.Count();
}

for (int i = 0; i < numberOfObjects; i += 10000)
{
    using (var context = new myContext())
    {
        var allObjekts = context.objects.OrderBy(s => s.ID)
            .Skip(i)
            .Take(10000)
            .Include("item1")
            .Include("item2")
            .Include("item2.item1")
            .Include("item2.item1.item");

            client.IndexMany(allObjekts);
    }
}

This allowed the Gargage collector to do its job since the context was wrapped in the for-loop. I don't know if there is a faster way, I am able to insert about 100.000 objects in Elasticsearch in about 400 seconds.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM