简体   繁体   中英

EF Core query is extremely slow with many relations between tables

I have an EF Core query like this:

var existingViolations = await _context.Parent
       .Where(p => p.ProjectId == projectId)
          .Include(p => p.Relation1)
          .Include(p => p.Relation2)
               .ThenInclude(r => r.Relation21)
          .Include(p => p.Relation3)
        .AsSplitQuery()
        .ToListAsync();

This query usually takes between 55-65 seconds which can sometimes cause database timeouts. All the tables included in the query, including the parent table, contain anywhere from 30k-60k rows and 3-6 columns each. I have tried splitting it up into smaller queries using LoadAsync() like this:

_context.ChangeTracker.LazyLoadingEnabled = false;
_context.ChangeTracker.AutoDetectChangesEnabled = false;

await _context.Relation1.Where(r1 => r1.Parent.ProjectId == projectId).LoadAsync();

await _context.Relation2.Where(r2 => r2.Parent.ProjectId == projectId).Include(r2 => r2.Relation21).LoadAsync();

await _context.Relation3.Where(r3 => r3.Parent.ProjectId == projectId).LoadAsync();

var result = await _context.Parent.Where(p => p.ProjectId == projectId).ToListAsync();

That shaves about 5 seconds off the query time, so nothing to brag about. I've done some timings, and it's the last line ( var result = await _context.Parent.Where(p => p.ProjectId == projectId).ToListAsync(); ) that takes by far the longest to complete, about 90% of the spent time.

How can I optimize this further?

Without seeing the real entities and how they might be configured, it's anyone's guess.

Generally speaking when looking at performance issues like this, the first thing I would look to tackle is "what is this data being loaded for?" Typically when I see queries using a lot of Include s, this is something like a read operation to be loaded for a view or computation based on that selected data. Projection down to a simpler model can help significantly here if you really only need a few columns from each table to satisfy your needs. The benefit of projection using a Select across the related data to fill either a DTO/ViewModel class for a view or an an anonymous type for a computation is that Include will want to pass all columns for all eager loaded tables in the one go, where projection will only pass back the columns referenced. This can be critically important where tables can contain things like large text/binary columns that you don't need at all or right away. This is also very important in cases where the database server might be some distance from the consuming client or web server. Less data over the wire = faster performance, though the issue right now sounds like the DB query itself.

The next thing to check would be the relationships between all of the tables and any relevant configuration in EF vs. the table design. Waiting a minute to pull a few records from 30-60k rows is ridiculously long and I would be highly suspect of some flawed relationship mapping that isn't using FKs/indexes. Another place to look would be to run a profiler against the database to capture the exact SQL statement(s) being run, then execute those manually to investigate their execution plan which might reveal schema problems or some weirdness with the entity relationship mapping producing very inefficient queries.

The next thing to check would be to use a process of elimination to see if there is a bad relationship. Eliminate each of the eager load Include statements one by one and see how long each the query scenario takes. If there is a particular Include that is responsible for a drastic slow-down, drill down into that relationship to see why that might be.

That should give you a couple avenues to check. Consider revising your question with the actual entities and any further troubleshooting results.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM