I am trying to move data from a Microsoft SQL database into Elasticsearch. I am using EF 6 to generate the Models (Code First from database) and NEST to Serialize the Objects into Elasticsearch.
If I use Lazy loading it works fine, but unbelievably slow (so slow it can't be used). If I switch to Eager loading by adding this line:
public MyContext() : base("name=MyContext")
{
this.Configuration.LazyLoadingEnabled = false;
}
And Serializing like this:
ElasticClient client = new ElasticClient(settings);
var allObjects = context.objects
.Include("item1")
.Include("item2")
.Include("item2.item1")
.Include("item2.item1.item");
client.IndexMany(allObjects);
I end up getting a System.OutOfMemoryException, before the Serialization takes place (so just by loading the data). I have around 2.5 GB of available Memory and we are talking about 110.000 items in the database.
I have tried Sorting the data and then use Skip and Take to only Serialize a certain amount of objects at a time, however I only managed to get 60.000 objects inserted into Elasticsearch before running out of memory. It seems like the Garbage collector does not free enough memory, even if I did call it explicitly after inserting the certain amount of objects into Elasticsearch.
Is there a way to Eager load a specific number of Objects? Or another approach to Serializing large datasets?
In hindsight, a silly mistake. By doing this, I managed to accomplish my goals:
int numberOfObjects;
using (var context = new myContext())
{
numberOfObjects = context.objects.Count();
}
for (int i = 0; i < numberOfObjects; i += 10000)
{
using (var context = new myContext())
{
var allObjekts = context.objects.OrderBy(s => s.ID)
.Skip(i)
.Take(10000)
.Include("item1")
.Include("item2")
.Include("item2.item1")
.Include("item2.item1.item");
client.IndexMany(allObjekts);
}
}
This allowed the Gargage collector to do its job since the context was wrapped in the for-loop. I don't know if there is a faster way, I am able to insert about 100.000 objects in Elasticsearch in about 400 seconds.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.