简体   繁体   中英

Count total size of records before paging and return them after paging without hitting the database twice

I have a Paging method where it takes the page and limits then applies them to the set that was given (a pre-filter/queries might be applied to) then it applies the pagination to it so nothing has to be done in the server side from counting the records to applying the pagination (only counts the final result, which is stored in memory anyway)

        public async Task<PagingServiceResponse<T>> Apply<T>(IQueryable<T> set)
        {
            var httpQuery = _accessor.HttpContext.Request.Query;
            var pageValue = httpQuery.LastOrDefault(x => x.Key == "page").Value;
            if (pageValue.Count > 0) int.TryParse(pageValue, out _page);
            var limitValue = httpQuery.LastOrDefault(x => x.Key == "limit").Value;
            if (limitValue.Count > 0) int.TryParse(limitValue, out _limit);
            if (_limit > 1000 || _limit <= 0) _limit = 1000;
            if (_page <= 0) _page = 1;
            _size = await set.CountAsync();
            set = set.Take(_limit);
            if (_page > 1) set = set.Skip(_page * _limit);

            var data = await set.ToListAsync();
            var currentSize = data.Count;
            return new PagingServiceResponse<T>
            {
                Data = data,
                Size = _size,
                CurrentSize = currentSize,
                Page = _page,
                PerPage = _limit
            };
        }

So the problem here is that this hits the database twice, to check the total count ( CountAsync ) and receive the data ( ToListAsync )

and I'm trying to not do that since it performs the query twice, which is not pure query, there are filter operations that are applied to it.

If there is any advice for another approach or something I am all ears.

I'm using PostgreSQL & entity framework core(npgsql)

No, the whole premise of pagination is that you need to know the complete row count before obtaining a sub-set of total records. The only way that can be done in 1 query hit is to load all records. (Which is a way worse option for large sets: :)

One issue I see is that you are using Take to take a limit of rows (0<=1000?) then skipping the page size and page #? To me if the limit is 1000 and your page size is 25, and you're loading the first page, wouldn't this return 1000 rows? (rather than the first page's 25?) Normally I'd expect a paging query to act more like:

var pagedData = set.Skip(page * pageSize).Take(pageSize).ToList();

Where page is 0-based. (0 = page #1). This ensures that only a max of 25 rows are pulled back.

Some things you can do to further mitigate the cost of pagination queries and obtaining a count:

  1. Structure your query build & execution so that Ordering and Projections ( Select / ProjectTo ) occur after obtaining the Count .

  2. Make sure the context is short-lived and "fresh". This won't speed up the Count, but loading the sub-set will be slower the more entities that are being tracked.

  3. When an accurate count is not needed, provide a rough one that can be expanded as users select a further page, or can opt for retrieving a complete count.

Getting a rough count is similar to how a Google search gives an approximation, not a real count of results. The relatively simple technique I use is to take the current page size and # of pages displayed by the pager. The pagination control needs to be tweaked to not display navigation to "Last" page, and displaying a record count needs to be adjusted as well.

So for example 10 pages with a page size of 25. Before getting a count I base the count on the top ({PageSize} x {MaxPageCount} + 1) or 251. To get maxPageCount we need to look at the page number against the # of expected pages to display. (Ie 10)

int maxPageCount = (((page) / 10)+1) * 10;
int roughCountLimit = pageSize * maxPageCount + 1;

rowCount = set.Take(roughCountLimit).Count();
bool isRoughCount = rowCount == roughCountLimit;
var pagedData = set.Skip(page * pageSize).Take(pageSize).ToList();

For pages 1 to 10 This will return up to 11 pages. Ie

page #1 (0) / 10 = 0.  (0+1)* 10 = 10.
page #2 (1) / 10 = 0.  (0+1)* 10 = 10.
page #10 (9) / 10 = 0. (0+1)* 10 = 10.

The idea is that the pager will show something like:

"1 2 3 4 5 6 7 8 9 10..." while our page count would be set up to look at isRoughCount and display: "250+" rather than "251" if isRoughCount is True .

If and when a user selects "..." to load page #11 then going back to the maxPageCount :

 page #11 (10) / 10 = 1. (1+1)* 10 = 20.

This will result in roughCountLimit becoming 501. This will load up to 21 pages of records. If the database happened to only return 251 records, then Page 11 would still display with the 1 remaining record, and since isRoughCount would be false, the row count will update to display "251". Otherwise, the page count will be updated to display "500+" If the user continues to navigate through pages using the "...", the rough count limit will continue to increase. This will make the query gradually slower, but for those initial few sets of pages, the query will retrieve counts significantly faster.

The key thing about pagination and searching is that users should have the tools to find the data typically on the first page, or maybe first handful of pages of results. The actual # of times they would need to navigate through 10 pages of results, let alone more than 10 pages of results should be nearly never. (This would be an indication that you need better searching/filtering capabilities) At the same time, even with really good searching, dealing with really large data sets, a user generally won't care if there is 5000 rows or 500,000,000 rows. We can greatly speed up querying by reporting that there are "at least" 250 rows, then expand on that if and only if it is needed. The page count can be displayed as a hyperlink to run a specific full count query if they may need, or are simply curious about the specific 504,231,188 row count. That (expensive) fact doesn't need to be part of every query.

It is not possible to hit the database just once to get both number of objects and objects. If you want to do pagination both queries are required. Link to similar question

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM