简体   繁体   English

从过去n个小时中选择观看次数最多的帖子的最佳方法

[英]Best approach to select most viewed posts from last n hours

I'm using PHP and MYSQL(innodb engine). 我正在使用PHP和MYSQL(innodb引擎)。

As MYSQL reference says, selecting with comparison of one column and ordering by another can't use our considered index. 正如MYSQL参考所言,选择比较一列并按另一列排序不能使用我们考虑的索引。

I have a table named News . 我有一个名为News的表。

This table has at least 1 million records with two important columns: time_added and number_of_views . 该表至少有100万条记录,其中包含两个重要列: time_addednumber_of_views

I need to select most viewed records from last n hours. 我需要选择最近n小时中观看次数最多的记录。 What is the best index to do this? 这样做的最佳索引是什么? Or is it possible to run this kind of queries very fast for a table with millions of records? 还是可以对具有数百万条记录的表非常快速地运行这种查询?

I've already done this for "last day", meaning I can select most viewed records from last day by adding a new column ( date_added ). 我已经为“最后一天”完成了此操作,这意味着我可以通过添加新列( date_added )从前一天中选择观看次数最多的记录。 But if I decide to select these records from last week, I'm in trouble again. 但是,如果我决定从上周选择这些记录,那我又会遇到麻烦。

First, write the query: 首先,编写查询:

select n.*
from news n
where time_added >= date_sub(now(), interval <n> hours)
order by number_of_views desc
limit ??;

The best index is (time_added, number_of_views) . 最好的索引是(time_added, number_of_views) Actually, number_of_views won't be used for the full query, but I would include it for other possible queries. 实际上, number_of_views不会用于完整查询,但我会将其包含在其他可能的查询中。

First you must add the following line to the my.cnf (in section 首先,您必须将以下行添加到my.cnf中

[mysqld]):
query_cache_size = 32M (or more).
query_cache_limit = 32M (or more)

query_cache_size Sets size of the cache query_cache_size设置缓存的大小

Another option, which should pay attention - this query_cache_limit - it sets the maximum amount of the result of the query, which can be placed in the cache. 另一个要注意的选项-这个query_cache_limit-它设置查询结果的最大数量,可以将其放置在缓存中。 Check the status of the cache, you can request the following: 检查缓存的状态,您可以请求以下内容:

show global status like 'Qcache%';

http://dev.mysql.com/doc/refman/5.7/en/mysql-indexes.html http://dev.mysql.com/doc/refman/5.7/en/mysql-indexes.html

If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to look up rows. 如果表具有多列索引,则优化器可以使用索引的任何最左前缀来查找行。 For example, if you have a three-column index on (col1, col2, col3), you have indexed search capabilities on (col1), (col1, col2), and (col1, col2, col3). 例如,如果在(col1,col2,col3)上有一个三列索引,则在(col1),(col1,col2)和(col1,col2,col3)上都有索引搜索功能。 For more information, see http://dev.mysql.com/doc/refman/5.7/en/multiple-column-indexes.html 有关更多信息,请参见http://dev.mysql.com/doc/refman/5.7/en/multiple-column-indexes.html

You need a summary table. 您需要一个汇总表。 Since 'hour' is your granularity, something like this might work: 由于“小时”是您的粒度,因此可能会发生以下情况:

CREATE TABLE HourlyViews (
    the_hour DATETIME NOT NULL,
    ct SMALLINT UNSIGNED NOT NULL,
    PRIMARY KEY(the_hour)
) ENGINE=InnoDB;

It might need another column (and add it to the PK) if there is some breakdown of the items you are counting. 如果您要计数的项目有一些细分,则可能需要另一列(并将其添加到PK)。 And you might want some other things SUM'd or COUNT'd in this table. 您可能还需要在此表中进行SUM或COUNT运算。

Build and maintain this table incrementally. 逐步构建和维护该表。 That is, every hour, add another row to the table. 即,每小时将另一行添加到表中。 (Or you could keep it updated with INSERT .. ON DUPLICATE KEY UPDATE .. .) (或者,您可以使用INSERT .. ON DUPLICATE KEY UPDATE ..对其进行更新。)

More on Summary Tables 有关汇总表的更多信息

Then change the query to use that table; 然后更改查询以使用该表; it will be a lot faster. 它会快很多

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM