简体   繁体   中英

Entity Framework related objects using generics

There are many questions about using Include() and Load() for related table information when using Linq to entities. I have a different spin on this question.

My situation:

I have a table that holds many time records for each user in the system, and I use a repository pattern and generics for development, so all of my entities have an interface that they use to standard method calls. I have lazy loading turned off, and I load all data myself. So the code in the repository for loading all records from a table with related tables is like this:

public class Repository<T> : IRepository<T> where T : class
{
    protected readonly ApplicationDbContext Context;

    public Repository(IConnectionHelper connection)
    {
        Context = connection.Context;
    }
    public virtual DbSet<T> ObjectSet
    {
        get { return Context.Set<T>(); }
    }  
    public List<T> GetAll(String[] include, Expression<Func<T, bool>> predicate)
    {
        DbQuery<T> outQuery = null;
        foreach (String s in include)
        {
            outQuery = ObjectSet.Include(s);
            outQuery.Load();
        }
        return outQuery.Where(predicate).ToList();
    }
}

The call to the method is like this:

string[] includes = { "User.UserProfile", "CampaignTimeClocks.CampaignRole.Campaign", "Site", "Type" };
DateTime uTcCurrent = GetUtc();
DateTime MinClockinDate = uTcCurrent.AddHours(-10);
List<TimeClock> tcPending = _timeClock.GetAll(includes, x => (x.PendingReview || x.ClockInDate < MinClockinDate && x.ClockOutDate == null) && (x.Site.Id == currentUser.SiteId));

When this method runs and loads the first User.Profile table, it loads all the timeclock records and relates them to the all of the users, this take upwards of of a minute, this is way too long, since the end record count is only 185 records, but initial load of the query is running 27,000 * 560 users, or 15 million records, and this in only going to get much worse as time goes on.

The question is how do I do this without this load overhead, I know I can chain includes, but since the number of includes is going to change depending on what is and what I am doing with the data called, I cannot simply hard code a chain of includes.

I have also tried:

List<TimeClock> testLst =  _timeClock.GetAll(x => x.PendingReview || 
     (x.ClockInDate < MinClockinDate && x.ClockOutDate == null))
          .Select(x => new TimeClock{Id = x.Id,
                                     ClockInDate = x.ClockInDate, 
                                     ClockOutDate = x.ClockOutDate,
                                     TotalClockTime = x.TotalClockTime,
                                     Notes = x.Notes, 
                                     PendingReview = x.PendingReview, 
                                     Type = x.Type,
                                     User = x.User, 
                                     CampaignTimeClocks = x.CampaignTimeClocks,
                                     TimeClockAdjustments = x.TimeClockAdjustments,
                                     Site = x.User.Site}).ToList();

This will give me the User.Profile information but the Site and Type properties are null.

So I am a bit lost as to how to load the data I need here.

All help is greatly appreciated.

Can you get the initial list first

List<TimeClock> testLst =  _timeClock.Where(x => x.PendingReview || (x.ClockInDate < MinClockinDate && x.ClockOutDate == null)).ToList();

and then call a modified GetAll() that takes a T as an argument?

Each include you do will end up with a join being executed in the db. Suppose your left table is very big 1024 bytes in record size and that you have many details, say 1000 and and that the detail record size is only 100. This will result in the information for the left table to be repeated 1000 times, this information is going to be put on the wire by the db and EF has to filter out the duplicated to create your left instance.

It can be better to not use include and do an explicit load. Basically executing 2 queries on the same context.

I have an example like this, different than yours but i hope you get the idea. It can be up to 10 times faster than relying on include. (A db can handle only a limitited number of joins efficiently btw)

var adressen = adresRepository
                .Query(r => r.RelatieId == relatieId)
                .Include(i => i.AdresType)
                .Select().ToList();

var adresids = (from a in adressen select a.AdresId).ToList();
            IRepositoryAsync<Comm> commRepository = unitOfWork.RepositoryAsync<Comm>();

            var comms = commRepository
                .Query(c => adresids.Contains(c.AdresId))
                .Include(i => i.CommType)
                .Select();

For the commType and adresType I use include because there is a 1 to 1 relationship, I am avoiding too many joins and thus my multiple queries will be faster than a single one using include. I am not including the Comms in the first query to try and avoid the second query, the point is that 2 queries are faster in this case than a single one.

Note that my code is built using my own repositories, so this code will not work for you, but you can get the idea behind this.

The way that I found to do this more efficiently without using the Load() statement is to change the DBQuery to an IQueryable and chain the includes, and return the executed query results, and Remove the DBQuery.Load() all together. This changed the execution time of the query to milliseconds from seconds.

    public List<T> GetAll(String[] include)
    {
        IQueryable<T> outQuery = ObjectSet;
        foreach (String s in include)
        {
            outQuery = outQuery.Include(s);
        }
        return outQuery.ToList();
    }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM