简体   繁体   English

EF Core IQueryable、Count() 效率和最佳实践

[英]EF Core IQueryable, Count() efficiency and best practices

I'm writing queries (CQRS pattern) using EF Core 6 that will be consumed by my controller.我正在使用 EF Core 6 编写查询(CQRS 模式),我的 controller 将使用这些查询。 Besides of the view models I would also like to return some additional data to the client to enable proper pagination (eg. total count, returned count and remaining count).除了视图模型之外,我还想向客户端返回一些额外的数据以启用正确的分页(例如,总计数、返回计数和剩余计数)。

My original query looked like this:我原来的查询是这样的:

_posts //DbSet
    .Where(x => x.Status.Equals(PostStatus.Published))
    .OrderByDescending(x => x.CreatedAt)
    .Where(x => x.CreatedAt < createdAtCursor) //cursor pagination based on created at date
    .Include(x => x.Translations
        .Where(x => x.Language.Equals(language)))
    .Where(x => x.Translations != null) //excluding all posts without translation in a given language
    .Take(pageSize)
    .Select(x => x.ToPreviewModel())
    .AsNoTracking();

So I came up with the following solutions:所以我想出了以下解决方案:

Option 1: Split query, run CountAsync() in-between选项 1:拆分查询,在中间运行CountAsync()

var allPublishedPosts = _posts
        .Where(x => x.Status.Equals(PostStatus.Published))
        .Include(x => x.Translations
            .Where(x => x.Language.Equals(language)))
        .Where(x => x.Translations != null);

int totalCount = await allPublishedPosts
    .CountAsync()
    .ConfigureAwait(false);

var olderThanPublishedPosts = allPublishedPosts
    .Where(x => x.CreatedAt < createdAtCursor);

int olderThanCount = await olderThanPublishedPosts
    .CountAsync()
    .ConfigureAwait(false);

var returnedPublishedPosts = olderThanPublishedPosts
     .OrderByDescending(x => x.CreatedAt)
     .Take(pageSize)
     .Select(x => x.ToPreviewModel())
     .AsNoTracking();

int returnedCount = await returnedPublishedPosts
    .CountAsync()
    .ConfigureAwait(false);

Here I'm worried about Include() being on top of the query - I guess that could be quite expensive to run this command on a larger set than in original query and I also need to call db 3 times and wait for every CountAsync() one by one to finish.在这里,我担心Include()位于查询之上 - 我想在比原始查询更大的集合上运行此命令可能会非常昂贵,而且我还需要调用 db 3 次并等待每个CountAsync()一个一个完成。

Option 2: Split query, re-write it to take DbContext // DbSet as a parameter and run CountAsync() in parallel on multiple contexts选项 2:拆分查询,重写它以将DbContext // DbSet作为参数并在多个上下文上并行运行CountAsync()

IQueryable<PostBase> allPublishedPostsQuery(DbSet<PostBase> posts) =>
    posts
        .Where(x => x.Status.Equals(PostStatus.Published))
        .Include(x => x.Translations
            .Where(x => x.Language.Equals(language)))
        .Where(x => x.Translations != null);

//...so on for the other parts of the query

int[]? countResults;
using (var context1 = _contextFactory.CreateDbContext())
using (var context2 = _contextFactory.CreateDbContext())
using (var context3 = _contextFactory.CreateDbContext())
{
    countResults = await Task.WhenAll(
        allPublishedPostsQuery(context1.Posts)
            .CountAsync(),
        olderThanPublishedPostsQuery(context2.Posts)
            .CountAsync(),
        returnedPublishedPostsQuery(context3.Posts)
            .CountAsync())
        .ConfigureAwait(false);
};

int totalPostsCount = countResults[0];
int olderThanPostsCount = countResults[1];
int returnedPostsCount = countResults[2];

Looks like an overkill to me, but I don't know performance-wise.对我来说看起来有点矫枉过正,但我不知道性能方面。 It's a pity we cannot do it on a single context.很遗憾我们不能在单一的上下文中做到这一点。

Option 3: ???选项3: ??? Ideally I would like to do it in a one complete query, but I'm not sure if this is possible.理想情况下,我想在一个完整的查询中完成,但我不确定这是否可行。

Eventually I could leave all the calculations to the front-end and provide only total count, but still I'm curious for the future how to solve it in a best way and possibly improve efficiency of the query itself.最终我可以将所有计算留给前端并仅提供总数,但我仍然对未来如何以最佳方式解决它并可能提高查询本身的效率感到好奇。

When it comes to pagination, most controls just want a row count along with a page size and page #.当涉及到分页时,大多数控件只需要行数以及页面大小和页面#。 For the most part EF is pretty efficient when you tell it to fetch a count.在大多数情况下,当您告诉 EF 获取计数时,它非常有效。 There shouldn't be a need to execute multiple counts.不需要执行多个计数。 I'll typically just leave off OrderBy* clauses off the query, take the count, then append the OrderBy* clauses and fetch the page of data.我通常会从查询中删除 OrderBy OrderBy*子句,进行计数,然后 append OrderBy*子句并获取数据页。

That said, with very large datasets and potentially dynamic criteria used to filter that data, count queries can get fairly expensive.也就是说,对于非常大的数据集和用于过滤该数据的潜在动态标准,计数查询可能会变得相当昂贵。 If you are facing a situation where you're potentially looking at very large count values and slow count queries, What I can suggest is a bit of a "cheat":如果您面临的情况是您可能会查看非常大的计数值和缓慢的计数查询,我可以建议的是有点“作弊”:

Take a situation where you have a pagination control set to a page size of 50 and you display 10 page selectors at a time.假设您将分页控件设置为页面大小为 50,并且一次显示 10 个页面选择器。 1-10 then a >> if more pages. 1-10 然后是 >> 如果有更多页面。

Build the query without OrderBy/Pagination.在没有 OrderBy/Pagination 的情况下构建查询。 Based on the selected page number, determine how many sets of 10 pages need to be loaded:根据选择的页码,确定需要加载多少组 10 页:

var pages = ((pageNumber / 10) +1) * 10;

So for instance if loading page 1, pages = 10. If loading page 11 or 12, pages = 20.例如,如果加载第 1 页,则 pages = 10。如果加载第 11 或 12 页,则 pages = 20。

Next, limit your total rows based on the pages needed + 1 row, and base your count on this figure.接下来,根据需要的页数 + 1 行来限制总行数,并以此数字为基础。

var count = query.Take(pages * pageSize + 1).Count();

Lastly, check the count against pages * pageSize + 1, if it is equal you have more pages worth to load, otherwise the count reflects the actual number of records.最后,对照 pages * pageSize + 1 检查计数,如果相等,则您有更多页面值得加载,否则计数反映实际记录数。

What this gives you is an adaptive page count.这为您提供的是自适应页数。 When loading the first page of results it will Count over a maximum of 500 rows (page size of 50, 10 pages).加载结果的第一页时,它将最多计数 500 行(页面大小为 50,10 页)。 If there are <= 500 rows the pagination control will display the correct # of pages and we can display the actual count.如果有 <= 500 行,分页控件将显示正确的页数,我们可以显示实际计数。 If there are > 500 rows then the pagination will display 10 pages plus expects an 11th page (rendering in the next page >> control), and we display the row count as something like "500+" or "at least 500" rather than the specific count.如果有 > 500 行,则分页将显示 10 页加上第 11 页(在下一页 >> 控件中呈现),我们将行数显示为“500+”或“至少 500”而不是具体计数。 Where the client may want to get an accurate row count for whatever reason, I will render this as a hyperlink that will actually do a full count and update the total.无论出于何种原因,客户可能希望获得准确的行数,我将把它呈现为一个超链接,它实际上会进行完整计数并更新总数。 If the user selects the >> next page, requesting page #11, the count check bumps to limit at 20 pages rather than 10. If there are more than 1000 results the rendering would show pages 11-20 /w the >> next page, and a row count of "1000+".如果用户选择 >> 下一页,请求第 11 页,则计数检查会限制在 20 页而不是 10 页。如果结果超过 1000 个,渲染将显示第 11-20 页/w the >> 下一页,以及“1000+”的行数。

The limitation of this approach is that you cannot give the user the ability to go to a specific page or show the accurate row count for every query unless they explicitly request it.这种方法的局限性在于,除非用户明确请求,否则您无法让用户能够将 go 转到特定页面或显示每个查询的准确行数。 But this is weighed against improving the most typical scenarios where users expect to search for and find their results on the first page or few pages, and would rarely go beyond the 10 pages before refining their search criteria.但这与改进用户期望在第一页或几页上搜索并找到他们的结果的最典型场景相权衡,并且在优化搜索条件之前很少会超出 10 页。 The old adage of "the best place to hide a dead body is on page 2 of Google's search results." “隐藏尸体的最佳地点在 Google 搜索结果的第 2 页”这句古老的格言。 Row counts and accurate pagination results for very large or complex data sets can be a rather expensive cost for little benefit.非常大或复杂的数据集的行计数和准确的分页结果可能是相当昂贵的成本,而收益却很少。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM