简体   繁体   中英

Best practice to record large amount of hits into MySQL database

Well, this is the thing. Let's say that my future PHP CMS need to drive 500k visitors daily and I need to record them all in MySQL database (referrer, ip address, time etc.). This way I need to insert 300-500 rows per minute and update 50 more. The main problem is that script would call database every time I want to insert new row, which is every time someone hits a page.

My question, is there any way to locally cache incoming hits first (and what is the best solution for that apc, csv...?) and periodically send them to database every 10 minutes for example? Is this good solution and what is the best practice for this situation?

500k daily it's just 5-7 queries per second. If each request will be served for 0.2 sec, then you will have almost 0 simultaneous queries, so there is nothing to worry about.
Even if you will have 5 times more users - all should work fine.
You can just use INSERT DELAYED and tune your mysql.
About tuning: http://www.day32.com/MySQL/ - there is very useful script (will change nothing, just show you the tips how to optimize settings).

You can use memcache or APC to write log there first, but with using INSERT DELAYED MySQL will do almost same work, and will do it better :)

Do not use files for this. DB will serve locks much better, than PHP. It's not so trivial to write effective mutexes, so let DB (or memcache, APC) do this work.

A frequently used solution:

You could implement an counter in memcached which you increment on an visit, and push an update to the database for every 100 (or 1000) hits.

We do this by storing locally on each server to CSV, then having a minutely cron job to push the entries into the database. This is to avoid needing a highly available MySQL database more than anything - the database should be able to cope with that volume of inserts without a problem.

Save them to a directory-based database (or flat file, depends) somewhere and at a certain time, use a PHP code to insert/update them into your MySQL database. Your php code can be executed periodically using Cron, so check if your server has Cron so that you can set the schedule for that, say every 10 minutes.

Have a look at this page: http://damonparker.org/blog/2006/05/10/php-cron-script-to-run-automated-jobs/ . Some codes have been written in the cloud and are ready for you to use :)

One way would be to use Apache access.log. You can get a quite fine logging by using cronolog utility with apache . Cronolog will handle the storage of a very big number of rows in files, and can rotate it based on volume day, year, etc. Using this utility will prevent your Apache from suffering of log writes.

Then as said by others, use a cron-based job to analyse these log and push whatever summarized or raw data you want in MySQL.

You may think of using a dedicated database (or even database server) for write-intensive jobs, with specific settings. For example you may not need InnoDB storage and keep a simple MyIsam. And you could even think of another database storage (as said by @Riccardo Galli)

If you absolutely HAVE to log directly to MySQL, consider using two databases. One optimized for quick inserts, which means no keys other than possibly an auto_increment primary key. And another with keys on everything you'd be querying for, optimized for fast searches. A timed job would copy hits from the insert-only to the read-only database on a regular basis, and you end up with the best of both worlds. The only drawback is that your available statistics will only be as fresh as the previous "copy" run.

I have also previously seen a system which records the data into a flat file on the local disc on each web server (be careful to do only atomic appends if using multiple proceses), and periodically asynchronously write them into the database using a daemon process or cron job.

This appears to be the prevailing optimium solution; your web app remains available if the audit database is down and users don't suffer poor performance if the database is slow for any reason.

The only thing I can say, is be sure that you have monitoring on these locally-generated files - a build-up definitely indicates a problem and your Ops engineers might not otherwise notice.

您可以使用beanstalk或IronQ使用队列策略

对于大量的写操作和这种数据,您可能会发现更合适的mongodb或couchdb

Because INSERT DELAYED is only supported by MyISAM , it is not an option for many users.

We use MySQL Proxy to defer the execution of queries matching a certain signature.

This will require a custom Lua script; example scripts are here , and some tutorials are here .

The script will implement a Queue data structure for storage of query strings, and pattern matching to determine what queries to defer. Once the queue reaches a certain size, or a certain amount of time has elapsed, or whatever event X occurs, the query queue is emptied as each query is sent to the server.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM