简体繁体 English

什么是最佳的数据库缓存用于此应用程序？

[英]What would be the best DB cache to use for this application?

原文 2011-11-29 20:30:49 2 3 mysql/ database/ caching/ memcached

I am about 70% of the way through developing a web application which contains what is essentially a largeish datatable of around 50,000 rows. 我开发Web应用程序的方式大约占70％，该应用程序实质上包含大约50,000行的大型数据表。

The app itself is a filtering app providing various different ways of filtering this table such as range filtering by number, drag and drop filtering that ultimately performs regexp filtering, live text searching and i could go on and on. 该应用程序本身是一个过滤应用程序，提供了各种不同的方式来过滤该表，例如按数字进行范围过滤，拖放式过滤（最终执行正则表达式过滤），实时文本搜索，我可以继续进行下去。

Due to this I coded my MySQL queries in a modular fashion so that the actual query itself is put together dynamically dependant on the type of filtering happening. 因此，我以模块化的方式对MySQL查询进行了编码，以便实际查询本身可以根据发生的过滤类型动态地组合在一起。

At the moment each filtering action (in total) takes between 250-350ms on average. 目前，每个过滤操作（总计）平均需要250-350ms。 For example:- 例如：-

The user grabs one end of a visual slider, drags it inwards, when he/she lets go a range filtering query is dynamically put together by my PHP code and the results are returned as a JSON response. 用户放开可视滑块的一端，然后将其向内拖动，当他/她放手时，范围过滤查询将由我的PHP代码动态组合在一起，结果将作为JSON响应返回。 The total time from the user letting go of the slider until the user has recieved all data and the table is redrawn is between 250-350ms on average. 从用户放开滑块直到获得所有数据并重新绘制表格的总时间平均为250-350ms。

I am concerned with scaleability further down the line as users can be expected to perform a huge number of the filtering actions in a short space of time in order to retrieve the data they are looking for. 我担心可扩展性会进一步下降，因为可以期望用户在短时间内执行大量筛选操作，以便检索他们要查找的数据。

I have toyed with trying to do some fancy cache expiry work with memcached but couldn't get it to play ball correctly with my dynamically generated queries. 我曾尝试用memcached做一些花哨的缓存过期工作，但无法通过动态生成的查询正确地发挥作用。 Although everything would cache correctly I was having trouble expiring the cache when the query changes and keeping the data relevent. 尽管一切都会正确缓存，但是当查询更改并保持数据相关性时，我在使缓存过期时遇到了麻烦。 I am however extremely inexperienced with memcached. 但是，我对memcached非常缺乏经验。 My first few attempts have led me to believe that memcached isn't the right tool for this job (due to the highly dynamic nature of the queries. Although this app could ultimately see very high concurrent usage. 我的前几次尝试使我相信memcached不是适合此工作的工具（由于查询的高度动态性质。尽管此应用最终可能会看到很高的并发使用率。

So... My question really is, are there any caching mechanisms/layers that I can add to this sort of application that would reduce hits on the server? 所以...我的问题确实是，我可以向此类应用程序添加任何缓存机制/层来减少服务器上的访问量吗？ Bearing in mind the dynamic queries. 牢记动态查询。

Or... If memcached is the best tool for the job, and I am missing a piece of the puzzle with my early attempts, can you provide some information or guidance on using memcached with an application of this sort? 或者...如果memcached是完成这项工作的最佳工具，而我在早期尝试中却遗漏了一个难题，那么您能否提供有关在此类应用程序中使用memcached的一些信息或指导？

Huge thanks to all who respond. 非常感谢所有回应。

EDIT: I should mention that the database is MySQL. 编辑：我应该提到数据库是MySQL。 The siite itself is running on Apache with an nginx proxy. siite本身在带有nginx代理的Apache上运行。 But this question is related purely to speeding up and reducing the database hits, of which there are many. 但是这个问题纯粹与加速和减少数据库命中有关，其中有很多。

I should also add that the quoted 250-350ms roundtrip time is fully remote. 我还应该补充说，引用的250-350ms往返时间是完全远程的。 As in from a remote computer accessing the website. 就像从远程计算机访问网站一样。 The time includes DNS lookup, Data retrieval etc. 时间包括DNS查找，数据检索等。

3 个解决方案

If I understand your question correctly, you're essentially asking for a way to reduce the number of queries against the database eventhough there will be very few exactly the same queries. 如果我正确地理解了您的问题，那么您实际上是在寻求一种减少针对数据库的查询数量的方法，尽管几乎没有完全相同的查询。

You essentially have three choices: 您基本上有三个选择：

Live with having a large amount of queries against your database, optimise the database with appropriate indexes and normalise the data as far as you can. 可以对数据库进行大量查询，使用适当的索引优化数据库，并尽可能地规范化数据。 Make sure to avoid normal performance pitfalls in your query building (lots of ORs in ON-clauses or WHERE-clauses for instance). 确保避免在查询构建中出现正常的性能陷阱（例如，ON子句或WHERE子句中的许多OR）。 Provide views for mashup queries, etc. 提供用于混搭查询等的视图。
Cache the generic queries in memcached or similar, that is, without some or all filters. 将通用查询缓存在memcached或类似的缓存中，即没有某些或所有过滤器。 And apply the filters in the application layer. 并在应用程序层中应用过滤器。
Implement a search index server, like SOLR. 实现搜索索引服务器，例如SOLR。

I would recommend you do the first though. 我建议您尽管先做。 A roundtrip time of 250~300 ms sounds a bit high even for complex queries and it sounds like you have a lot to gain by just improving what you already have at this stage. 即使对于复杂的查询，往返时间为250〜300毫秒也听起来有点高，这听起来像您只是通过改善此阶段已经拥有的内容而获益匪浅。 For much higher workloads, I'd suggest solution number 3, it will help you achieve what you are trying to do while being a champ at handling lots of different queries. 对于更高的工作负载，我建议使用解决方案3，它将帮助您实现正在尝试做的事情，同时还可以处理许多不同的查询。

Use Memcache and set the key to be the filtering query or some unique key based on the filter. 使用Memcache并将键设置为过滤查询或基于过滤器的某些唯一键。 Ideally you would write your application to expire the key as new data is added. 理想情况下，您将编写应用程序以使密钥在添加新数据时过期。

You can only make good use of caches when you occasionally run the same query. 仅在偶尔运行同一查询时，才可以充分利用缓存。
A good way to work with memcache caches is to define a key that matches the function that calls it. 使用内存缓存的一种好方法是定义一个与调用它的函数匹配的键。 For example, if the model named UserModel has a method getUser($userID) , you could cache all users as USER_id . 例如，如果名为UserModel的模型具有方法getUser($userID) ，则可以将所有用户缓存为USER_id 。 For more advanced functions ( Model2::largerFunction($arg1, $arg2) ) you can simply use MODEL2_arg1_arg2 - this will make it easy to avoid namespace conflicts. 对于更高级的功能（ Model2::largerFunction($arg1, $arg2) ），您可以简单地使用MODEL2_arg1_arg2这将很容易避免命名空间冲突。
For fulltext searches, use a search indexer such as Sphinx or Apache Lucene. 对于全文搜索，请使用搜索索引器，例如Sphinx或Apache Lucene。 They improve your queries a LOT (I was able to do a fulltext search on a 10 million record table on a 1.6 GHz atom processor, in less than 500 ms). 他们改善了您的查询量（我能够在不到500毫秒的时间内对1.6 GHz原子处理器上的1000万条记录表进行全文搜索）。