简体繁体 English

在Django视图中保持状态以提高分页性能

[英]Keeping state in a Django view to improve performance for pagination

原文 2015-10-05 23:29:14 2 2 python/ django/ postgresql/ datatables

I'm designing a data-tables -driven Django app and have an API view that data-tables calls with AJAX (I'm using data-tables in its server-side processing mode). 我正在设计一个data-tables驱动的Django应用程序，并有一个API视图， data-tables调用AJAX（我在服务器端处理模式下使用data-tables ）。 It implements searching, pagination, and ordering. 它实现了搜索，分页和排序。

My database recently got large (about 500,000 entries) and performance has greatly suffered, both for searches and for simply moving to the next page. 我的数据库最近变得很大（大约500,000个条目）并且性能极大地受到影响，无论是搜索还是简单地移动到下一页。 I suspect that the way I wrote the view is grossly inefficient. 我怀疑我写这个观点的方式非常低效。 Here's what I do in the view (suppose the objects in my database are pizzas): 这是我在视图中所做的（假设我的数据库中的对象是比萨饼）：

filtered = Pizza.objects.filter(...) to get the set of pizzas that match the search criteria. filtered = Pizza.objects.filter(...)获取与搜索条件匹配的一组比萨饼。 (Or Pizza.objects.all() if there is no search criteria). （如果没有搜索条件， Pizza.objects.all() ）。
paginated = filtered[start: start + length] to get only the current page of pizzas. paginated = filtered[start: start + length]只获取当前的比萨饼页面。 (At max, only 100 of them). （最多只有100个）。 Start and length are passed in from the data-tables client-side code, according to what page the user is on. 根据用户所在的页面，从data-tables客户端代码传入开始和长度。
pizzas = paginated.order_by(...) to apply the ordering to the current page. pizzas = paginated.order_by(...)将排序应用于当前页面。

Then I convert pizzas into JSON and return them from the view. 然后我将pizzas转换为JSON并从视图中返回它们。

It seems that, while search might justifiably be a slow operation on 500,000 entries, simply moving to the next page shouldn't require us to redo the whole search. 看来，虽然搜索可能有理由对500,000个条目进行缓慢操作，但只需移动到下一页就不需要我们重做整个搜索。 So what I was thinking of doing was caching some stuff in the view (it's a class-based view). 所以我想要做的就是在视图中缓存一些东西（这是一个基于类的视图）。 I would keep track of what the last search string was, along with the set of results it produced. 我会跟踪最后一个搜索字符串是什么，以及它产生的结果集。

Then, if a request comes through and the search string isn't different (which is what happens if the user is clicking through a few pages of results) I don't have to hit the database again to get the filtered results -- I can just use the cached version. 然后，如果请求通过并且搜索字符串没有不同（如果用户点击几页结果会发生这种情况）我不必再次访问数据库以获得过滤结果 - I可以只使用缓存版本。

It's a read-only application, so getting out of sync would not be an issue. 它是一个只读应用程序，因此不同步不会成为问题。

I could even keep a dictionary of a whole bunch of search strings and the pizzas they should produce. 我甚至可以保留一大堆搜索字符串和他们应该制作的比萨的字典。

What I'd like to know is : is this a reasonable solution to the problem? 我想知道的是 ：这是解决问题的合理方法吗？ Or is there something I'm overlooking? 或者我有什么东西可以俯瞰？ Also, am I re-inventing the wheel here? 另外，我在这里重新发明了轮子吗？ Not that this wouldn't be easy to implement, but is there a built-in option on QuerySet or something to do this? 并不是说这不容易实现，但在QuerySet上有内置选项还是要做到这一点？

2 个解决方案

pizzas = paginated.order_by(...) is slow, it sorts all Pizzas NOT the current page. pizzas = paginated.order_by(...)很慢，它排序所有Pizzas而不是当前页面。 Indexes help: https://docs.djangoproject.com/en/1.8/topics/db/optimization/#use-standard-db-optimization-techniques 索引帮助： https ： //docs.djangoproject.com/en/1.8/topics/db/optimization/#use-standard-db-optimization-techniques

If you really want cache, checkout https://github.com/Suor/django-cacheops , "A slick app that supports automatic or manual queryset caching and automatic granular event-driven invalidation." 如果你真的想要缓存，请查看https://github.com/Suor/django-cacheops ，“一个支持自动或手动查询集缓存和自动粒度事件驱动失效的灵活应用程序。”

There are multiple way of improving your code structure, 有多种方法可以改善您的代码结构，

First is you fetch only that data which is required according to your page number using Django ORM hit, second is you cache your ORM output and reuse that result if same query is passed again. 首先，您只使用Django ORM命中根据您的页码获取所需的数据，其次是您缓存ORM输出并重新使用该结果，如果再次传递相同的查询。

First goes like this. 首先是这样的。

In your code 在你的代码中

Pizza.objects.all() paginated = filtered[start: start + length] You are first fetching all data then, you are slicing that, which is very expensive SQL query, convert that to Pizza.objects.all() paginated = filtered[start: start + length]你首先获取所有数据然后，你正在切片，这是非常昂贵的SQL查询，将其转换为

filtered = Pizza.objects.all()[(page_number-1) * 30, (page_number-1) * 30 + 30]

above given ORM will only fetch those rows which are according to supplied page number and is very fast compare to fetching all and then slicing it. 上面给出的ORM将只获取那些根据提供的页码的行，并且与获取所有然后切片相比非常快。

The second way , is you first fetch data according to query put that on, caching solution like memcache or redis, next time when you are required to fetch the data from the database, then first check if data is present in cache for that query, if present, then simply use that data, because in-memory caching solution are way faster than fetching the data from the database because of very large input output transfer between memory and hard drive and we know hard drives are traditionally slow. 第二种方式 ，首先是根据查询提取数据，缓存解决方案，如memcache或redis，下次当你需要从数据库中获取数据时，首先检查数据是否存在于该查询的缓存中，如果存在，那么只需使用该数据，因为内存缓存解决方案比从数据库中获取数据更快，因为内存和硬盘之间的输入输出传输非常大，我们知道硬盘传统上很慢。