简体   繁体   English

将大量命中记录到MySQL数据库的最佳实践

[英]Best practice to record large amount of hits into MySQL database

Well, this is the thing. 嗯,这就是事情。 Let's say that my future PHP CMS need to drive 500k visitors daily and I need to record them all in MySQL database (referrer, ip address, time etc.). 假设我未来的PHP CMS需要每天驱动500k访问者,我需要将它们全部记录在MySQL数据库中(引用者,IP地址,时间等)。 This way I need to insert 300-500 rows per minute and update 50 more. 这样我需要每分钟插入300-500行并更新50行。 The main problem is that script would call database every time I want to insert new row, which is every time someone hits a page. 主要问题是每次我想插入新行时脚本都会调用数据库,这是每次有人点击一个页面时。

My question, is there any way to locally cache incoming hits first (and what is the best solution for that apc, csv...?) and periodically send them to database every 10 minutes for example? 我的问题是,有没有办法首先在本地缓存传入的命中(对于那个apc,csv ......最好的解决方案是什么?)并定期每隔10分钟将它们发送到数据库? Is this good solution and what is the best practice for this situation? 这是一个很好的解决方案吗?这种情况的最佳做法是什么?

500k daily it's just 5-7 queries per second. 每天500k,每秒只有5-7个查询。 If each request will be served for 0.2 sec, then you will have almost 0 simultaneous queries, so there is nothing to worry about. 如果每个请求将在0.2秒内提供,那么您将同时拥有几乎0个查询,因此无需担心。
Even if you will have 5 times more users - all should work fine. 即使你的用户数量增加了5倍 - 一切都应该可以正常工作。
You can just use INSERT DELAYED and tune your mysql. 您可以使用INSERT DELAYED并调整您的mysql。
About tuning: http://www.day32.com/MySQL/ - there is very useful script (will change nothing, just show you the tips how to optimize settings). 关于调优: http//www.day32.com/MySQL/ - 有非常有用的脚本(不会改变任何内容,只是向您展示如何优化设置的提示)。

You can use memcache or APC to write log there first, but with using INSERT DELAYED MySQL will do almost same work, and will do it better :) 您可以先使用memcache或APC在那里写日志,但使用INSERT DELAYED MySQL将完成几乎相同的工作,并且会做得更好:)

Do not use files for this. 不要使用文件。 DB will serve locks much better, than PHP. DB将比PHP更好地提供锁。 It's not so trivial to write effective mutexes, so let DB (or memcache, APC) do this work. 编写有效的互斥锁并不是那么简单,所以让DB(或memcache,APC)完成这项工作。

A frequently used solution: 经常使用的解决方案:

You could implement an counter in memcached which you increment on an visit, and push an update to the database for every 100 (or 1000) hits. 您可以在memcached中实现一个计数器,您可以在访问时增加该计数器,并将每次100(或1000)次点击的更新推送到数据库。

We do this by storing locally on each server to CSV, then having a minutely cron job to push the entries into the database. 我们这样做是通过在每台服务器上本地存储到CSV,然后有一个微小的cron作业将条目推送到数据库中。 This is to avoid needing a highly available MySQL database more than anything - the database should be able to cope with that volume of inserts without a problem. 这是为了避免需要高度可用的MySQL数据库 - 数据库应该能够毫无问题地处理该数量的插入。

Save them to a directory-based database (or flat file, depends) somewhere and at a certain time, use a PHP code to insert/update them into your MySQL database. 将它们保存到基于目录的数据库(或平面文件,取决于)某处,并在某个时间,使用PHP代码将它们插入/更新到MySQL数据库中。 Your php code can be executed periodically using Cron, so check if your server has Cron so that you can set the schedule for that, say every 10 minutes. 您的PHP代码可以使用Cron定期执行,因此请检查您的服务器是否具有Cron,以便您可以设置该计划,例如每10分钟一次。

Have a look at this page: http://damonparker.org/blog/2006/05/10/php-cron-script-to-run-automated-jobs/ . 看看这个页面: http//damonparker.org/blog/2006/05/10/php-cron-script-to-run-automated-jobs/ Some codes have been written in the cloud and are ready for you to use :) 有些代码已经写在云端,可供您使用:)

One way would be to use Apache access.log. 一种方法是使用Apache access.log。 You can get a quite fine logging by using cronolog utility with apache . 通过使用带有apache的cronolog实用程序,您可以获得非常好的日志记录。 Cronolog will handle the storage of a very big number of rows in files, and can rotate it based on volume day, year, etc. Using this utility will prevent your Apache from suffering of log writes. Cronolog将处理文件中大量行的存储,并可根据卷日,年等旋转它。使用此实用程序将阻止您的Apache遭受日志写入。

Then as said by others, use a cron-based job to analyse these log and push whatever summarized or raw data you want in MySQL. 然后正如其他人所说,使用基于cron的作业来分析这些日志,并在MySQL中推送您想要的任何汇总或原始数据。

You may think of using a dedicated database (or even database server) for write-intensive jobs, with specific settings. 您可以考虑使用专用数据库(甚至数据库服务器)进行具有特定设置的写入密集型作业。 For example you may not need InnoDB storage and keep a simple MyIsam. 例如,您可能不需要InnoDB存储并保留简单的MyIsam。 And you could even think of another database storage (as said by @Riccardo Galli) 你甚至可以想到另一个数据库存储(正如@Riccardo Galli所说)

If you absolutely HAVE to log directly to MySQL, consider using two databases. 如果您绝对需要直接登录MySQL,请考虑使用两个数据库。 One optimized for quick inserts, which means no keys other than possibly an auto_increment primary key. 一个针对快速插入进行了优化,这意味着除了可能的auto_increment主键之外没有其他键。 And another with keys on everything you'd be querying for, optimized for fast searches. 还有另一个带有您要查询的所有内容的按键,针对快速搜索进行了优化。 A timed job would copy hits from the insert-only to the read-only database on a regular basis, and you end up with the best of both worlds. 定时作业会定期将命中数据从仅插入数据库复制到只读数据库,最终您将获得两全其美的效果。 The only drawback is that your available statistics will only be as fresh as the previous "copy" run. 唯一的缺点是您的可用统计信息仅与之前的“复制”运行一样新鲜。

I have also previously seen a system which records the data into a flat file on the local disc on each web server (be careful to do only atomic appends if using multiple proceses), and periodically asynchronously write them into the database using a daemon process or cron job. 我之前也看到过一个系统,它将数据记录到每个Web服务器上的本地磁盘上的平面文件中(如果使用多个过程,请小心只做原子附加),并使用守护进程定期将它们异步写入数据库或cron job。

This appears to be the prevailing optimium solution; 这似乎是流行的优化解决方案; your web app remains available if the audit database is down and users don't suffer poor performance if the database is slow for any reason. 如果审计数据库已关闭,则您的Web应用程序仍然可用,如果数据库因任何原因而变慢,则用户不会遇到性能不佳的情况。

The only thing I can say, is be sure that you have monitoring on these locally-generated files - a build-up definitely indicates a problem and your Ops engineers might not otherwise notice. 我唯一可以说的是,确保您对这些本地生成的文件进行监控 - 构建肯定表明存在问题,而您的Ops工程师可能不会注意到。

您可以使用beanstalk或IronQ使用队列策略

对于大量的写操作和这种数据,您可能会发现更合适的mongodb或couchdb

Because INSERT DELAYED is only supported by MyISAM , it is not an option for many users. 因为INSERT DELAYED仅受MyISAM支持,所以它不是许多用户的选项。

We use MySQL Proxy to defer the execution of queries matching a certain signature. 我们使用MySQL Proxy来推迟匹配特定签名的查询的执行。

This will require a custom Lua script; 这将需要一个自定义的Lua脚本; example scripts are here , and some tutorials are here . 示例脚本在这里一些教程在这里

The script will implement a Queue data structure for storage of query strings, and pattern matching to determine what queries to defer. 该脚本将实现用于存储查询字符串的Queue数据结构,以及用于确定要延迟的查询的模式匹配。 Once the queue reaches a certain size, or a certain amount of time has elapsed, or whatever event X occurs, the query queue is emptied as each query is sent to the server. 一旦队列达到一定大小,或者已经过了一定的时间,或者发生任何事件X,查询队列就会被清空,因为每个查询都被发送到服务器。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM