简体   繁体   English

在MySQL中存储视图/统计信息的最佳方法

[英]Best way to store views / stats in MySQL

I'm working no a site which stores individual page views in a 'views' table: 我没有在将“视图”表中存储单个页面视图的网站工作:

CREATE TABLE `views` (
  `view_id` bigint(16) NOT NULL auto_increment,
  `user_id` int(10) NOT NULL,
  `user_ip` varchar(15) NOT NULL,
  `view_url` varchar(255) NOT NULL,
  `view_referrer` varchar(255) NOT NULL,
  `view_date` date NOT NULL,
  `view_created` int(10) NOT NULL,
  PRIMARY KEY  (`view_id`),
  KEY `view_url` (`view_url`)
) ENGINE=MyISAM  DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;

It's pretty basic, stores user_id (the user's id on the site), their IP address, the url (without the domain to reduce the size of the table a little), the referral url (not really using that right now and might get rid of it), the date (YYYY-MM-DD format of course), and the unix timestamp of when the view occurred. 这很基本,它存储user_id(站点上的用户ID),其IP地址,URL(没有用于减小表大小的域),引荐网址(目前尚未真正使用该URL,可能会被删除) ),日期(当然是YYYY-MM-DD格式)以及视图发生时的Unix时间戳。

The table, of course, is getting rather big (4 million rows at the moment and it's a rather young site) and running queries on it are slow. 该表当然会变得很大(目前有400万行,这是一个相当年轻的站点),并且在该表上运行查询的速度很慢。

For some basic optimization I've now created a 'views_archive' table: 对于一些基本的优化,我现在创建了一个“ views_archive”表:

CREATE TABLE `views_archive` (
  `archive_id` bigint(16) NOT NULL auto_increment,
  `view_url` varchar(255) NOT NULL,
  `view_count` smallint(5) NOT NULL,
  `view_date` date NOT NULL,
  PRIMARY KEY  (`archive_id`),
  KEY `view_url` (`view_url`),
  KEY `view_date` (`view_date`)
) ENGINE=MyISAM  DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;

This ignores the user info (and referral url) and stores how many times a url was viewed per day. 这将忽略用户信息(和引荐网址),并存储每天查看该网址的次数。 This is probably how we'll generally want to use the data (how many times a page was viewed on a per day basis) so should make querying pretty quick, but even if I use it to mainly replace the 'views' table (right now I imagine I could show page views by hour for the last week/month or so and then show daily views beyond that and so would only need the 'views' table to contain data from the last week/month) but it's still a large table. 这可能就是我们通常要使用的数据(每天查看页面的次数)的方式,因此应该使查询变得非常快,但是即使我使用它来主要替换“视图”表(正确)现在我想我可以按小时显示过去一周/一个月左右的页面浏览量,然后显示每日浏览量,因此只需要“视图”表即可包含上周/一个月的数据),但它仍然很大表。

Anyway, long story short, I'm wondering if you can give me any tips on how to best handle the storage of stats/page views in a MySQL site, the goal being to both keep the size of the table(s) in the db as small as possible and still be able to easily (and at least relatively quickly) query the info. 总之,长话短说,我想知道您是否可以给我一些有关如何最好地处理MySQL网站中的统计信息/页面视图的存储的提示,目的是将表的大小都保持在db尽可能小,并且仍然能够轻松(至少相对快速)查询信息。 I've looked at partitioned tables a little, but the site doesn't have MySQL 5.1 installed. 我已经看了一些分区表,但是该站点未安装MySQL 5.1。 Any other tips or thoughts you could offer would be much appreciated. 您可以提供的其他任何提示或想法将不胜感激。

MySQL's Archive Storage Engine MySQL的档案存储引擎

http://dev.mysql.com/tech-resources/articles/storage-engine.html http://dev.mysql.com/tech-resources/articles/storage-engine.html

It is great for logs, it is quick to write, the one downside is reading is a bit slower. 它非常适合日志,可以快速写入,缺点是读取速度稍慢。 but it is great for log tables. 但它对于日志表非常有用。

You probably want to have a table just for pages, and have the user views have a reference to that table. 您可能希望有一个仅用于页面的表,并让用户视图引用该表。 Another possible optimization would be to have the user IP stored in a different table, perhaps some session table information. 另一种可能的优化是将用户IP存储在不同的表中,其中可能包含一些会话表信息。 That should reduce your query times somewhat. 那应该减少您的查询时间。 You're on the right track with the archive table; 您在存档表上的位置正确; the same optimizations should help that as well. 相同的优化也应对此有所帮助。

Assuming your application is a blog and you want to keep track of views for your blog posts, you will probably have a table called blog_posts . 假设您的应用程序是博客,并且您想跟踪博客文章的视图,则可能会有一个名为blog_posts的表。 In this table, I suggest you create a column called "views" and in this column, you will store a static value of how many views this post has. 在此表中,建议您创建一个名为“ views”的列,并在此列中存储该帖子具有多少个视图的静态值。 You will still use the views table, but that will only be utilized to keep track of all the views (and to do checks if they are "unique" or not). 您仍将使用views表,但这仅用于跟踪所有视图(并检查它们是否“唯一”)。

Basically, when a user visits a blog post post, it will check the views table to see if it should be added. 基本上,当用户访问博客帖子时,它将检查views表以查看是否应添加。 If so, it will also increment the "views" field in the corresponding row for the blog post in blog_posts . 如果是这样,它还将在blog_posts的博客文章的相应行中增加“ views”字段。 That way, you can just refer to the "views" field for each post to get a quick peek at how many views it has. 这样,您只需参考每个帖子的“视图”字段即可快速查看它拥有多少视图。 You can take this a step further and add redudancy by setting up a CRON job to re-count and verify all the views and update each blog_posts row accordingly at the end of the day. 您可以通过设置CRON作业来重新计算和验证所有视图,并在一天结束时相应地更新每个blog_posts行,从而更进一步并增加冗余。 Or if you prefer, you can also perform a re-count on each update if accuracy to-the-second is key . 或者,如果你愿意,你也可以在每次更新进行重新计数,如果精度海第二是关键

This solution works well if your site is read-intensive and you are constantly having to get a count of how many views each blog post has (again, assuming that is your application :-)) 如果您的站点是阅读密集型站点,并且您不断地需要统计每个博客帖子的浏览量(再次假设是您的应用程序:-),则此解决方案效果很好。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM