简体   繁体   中英

optimize linq query with related entities

i am new to linq, i started writing this query:

        var dProjects = Projects
         .Select(p => new Models.Project {
            ProjectID = p.ProjectID,
            Status = p.Status,
            ExpiresOn = p.ExpiresOn,
            LatestComments = p.ProjectComments
                               .OrderByDescending(pc => pc.CreatedOn)
                               .Select(pc => pc.Comments)
                               .FirstOrDefault(),                 
            ProjectFileIDs = p.ProjectFiles
                               .Select(pf => pf.BinaryFileID)
                               .AsQueryable()
         })
        .AsQueryable<Models.Project>();

I already know this query will perform really slow because related entities like ProjectComments and ProjectFiles will create nested selects, though it works and gives me right results that i need.

How can i optimize this query and get the same results? One of my guesses would be using inner join but ProjectComments and ProjectFiles already has a relationship in database through keys, so not sure what we can achieve by setting the relationship again.

Basically, need to know which is the best approach to take here from performance perspective. One thing to note is i am sorting ProjectComments and only taking the most recent one. Should i be using combination of join and group by into ? Help will be much appreciated. Thanks.

UPDATED:

Sorry, if i wasn't clear enough on what i am trying to do. Basically, in front end, i have a grid, which shows list of projects with latest project comments and list of all the files associated to project, so users can click on those links and actually open those documents. So the query that i have above is working and it does show the following in the grid:

Project ID (From Project table) Status (From Project table) ExpiresOn (From Project table) LatestComments (latest entry from ProjectComments table which has project ID as foreign key) ProjectFileIDs (list of file ids from ProjectFiles table which has Project ID as foreign key - i am using those File IDs and creating links so users can open those files).

So everything is working, i have it all setup, but the query is little slow. Right now we have very little data (only test data), but once this is launched, i am expecting lot of users/data and thus i want to optimize this query to the best, before it goes live. So, the goal here is to basically optimize. I am pretty sure this is not the best approach, because this will create nested selects.

In Entity Framework, you can drastically improve the performance of the queries by returning the objects back as an object graph instead of a projection. Entity Framework is extremely efficient at optimizing all but the most complex SQL queries, and can take advantage of deferred "Eager" loading vs. "Lazy" Loading (not loading related items from the db until they are actually accessed). This MSDN reference is a good place to start.

As far as your specific query is concerned, you could use this technique something like the following:

var dbProjects = yourContext.Projects
                    .Include(p => p.ProjectComments
                              .OrderByDescending(pc => pc.CreatedOn)
                              .Select(pc => pc.Comments)
                              .FirstOrDefault()
                            )
                    .Include(p => p.ProjectFileIDs)
                    .AsQueryable<Models.Project>();

note the .Include() being used to imply Eager Loading.

From the MDSN Reference on Loading Related Objects ,

Performance Considerations

When you choose a pattern for loading related entities, consider the behavior of each approach with regard to the number and timing of connections made to the data source versus the amount of data returned by and the complexity of using a single query. Eager loading returns all related entities together with the queried entities in a single query. This means that, while there is only one connection made to the data source, a larger amount of data is returned in the initial query. Also, query paths result in a more complex query because of the additional joins that are required in the query that is executed against the data source.

Explicit and lazy loading enables you to postpone the request for related object data until that data is actually needed. This yields a less complex initial query that returns less total data. However, each successive loading of a related object makes a connection to the data source and executes a query. In the case of lazy loading, this connection occurs whenever a navigation property is accessed and the related entity is not already loaded.

Do you get any boost in performance if you add Include statements before the Select ?

Example:

var dProjects = Projects
    .Include(p => p.ProjectComments)
    .Include(p => p.ProjectFiles)

Include allows all matching ProjectComments and ProjectFiles to be eagerly loaded. See Loading Related Entities for more details.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM