简体   繁体   中英

Why is Entity Framework dbSet.Include taking so long to return?

I'm doing a simple "get" on a table and the "include" part of the routine is taking much longer than I would expect.

I've narrowed the performance issue down to this snippet of code:

private List<Task> GetFilteredTasksWithOptionalIncludes(EntityRepository<Task> repo, ITaskRequest model, TaskIncludesModel includes, string accountID)
{
    var includedEntities = new List<Expression<Func<Task, object>>>();

    includedEntities.Add(t => t.Document.Transaction.Account);

    includedEntities.Add(p => p.Signature);

    if (includes != null)
    {
        if (includes.IncludeWorkflowActions)
        {
            includedEntities.Add(p => p.Actions);
        }

        if (includes.IncludeFileAttachments)
        {
            includedEntities.Add(p => p.Attachments);
        }
    }

    IQueryable<Task> tasks = repo.GetAllIncluding(includedEntities.ToArray());  //RETURNS SLOW

    return tasks.ToList();  //RETURNS FAST
}

All my code runs very fast until it hits the repo.GetAllInclude method shown below:

public IQueryable<TEntity> GetAllIncluding(params Expression<Func<TEntity, object>>[] includeProperties)
{
    foreach (var includeProperty in includeProperties)
    {
        dbSet.Include(includeProperty).Load();
    }

    return dbSet;
}

As I step through the code, it takes up to 6 or 7 seconds for the GetAllIncluded() line to return but it takes less than a second for the tasks.ToList() to return (which is when the actual SQL query gets run so this surprises me).

When I commented out the foreach loop that loads the includes and the entire call returns in under a second. What causes the includes to take so long? And is there some better way to do this?

Here is the SQL Profiler around the call if it helps. Everything above the red line is from the GetAllIncluded call. Everything below the red line is the actual query for the data. Is there a more efficient way to do this? Doesn't seem like it should take 10 seconds for a fairly simple call. 在此处输入图片说明

When you are calling .Load() in that loop, you are actually going away to the database and bringing the data into the context.

So depending on how often you are going to that loop, you are running that query over and over.

I would suggest removing the .Load() but you can still keep your including function. A basic Including function for a generic repository would be something like this:

public IQueryable<TEntity> Including(params Expression<Func<TEntity, object>>[] _includeProperties)
    {
        IQueryable<TEntity> query = context.Set<TEntity>();
        return _includeProperties.Aggregate(query, (current, includeProperty) => current.Include(includeProperty));
    }

And once you have called this and got hold of your IQueryable , simply call .ToList() on it to only pull from SQL once.

Have a read of http://msdn.microsoft.com/en-gb/data/jj592911.aspx to see what the Load method is actually doing.

Edit based on your comments:

You could implement the above function I posted and use it similar to what you are doing now, and implicitly calling Load() afterwards on the tasks queryable if you want.

private List<Task> GetFilteredTasksWithOptionalIncludes(EntityRepository<Task> repo, ITaskRequest model, TaskIncludesModel includes, string accountID)
{
    var includedEntities = new List<Expression<Func<Task, object>>>();

    includedEntities.Add(t => t.Document.Transaction.Account);

    includedEntities.Add(p => p.Signature);

    if (includes != null)
    {
        if (includes.IncludeWorkflowActions)
        {
            includedEntities.Add(p => p.Actions);
        }

        if (includes.IncludeFileAttachments)
        {
            includedEntities.Add(p => p.Attachments);
        }
    }

    IQueryable<Task> tasks = repo.Including(includedEntities.ToArray());  
    tasks.Load();
    return tasks.ToList();  
}

Or as an after step to Thewads just write a good old sproc to return all this in multiple sets at once then you can optimise the database schema with appropriate indexes and have it run as fast as possible.

That probably isn't going to be popular but when you start talking db performance it is the quicker way (and easier to work with because you have all the sql tooling for performance tuning) ... eg turn on query execution plans. You might find something insane is happening.

Another question would be ... do you really need all this data is there no filter conditions you can apply before you load it all? (assuming you fix the multiple loading issue)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM