简体   繁体   中英

Update Mysql database table with million records

I have user table with innoDB Engine which has about million drivers

CREATE TABLE user (
  `Id` int(11) NOT NULL AUTO_INCREMENT,
  `Column2` varchar(14) NOT NULL,
  `Column3` varchar(14) NOT NULL,
  `lat` double  NOT NULL,
  `lng` double  NOT NULL,
  PRIMARY KEY (`Id`)
) ENGINE=InnoDB

And i have a mobile application track the locations of users and send it to server and save it.

Now am sure when go live and have millions of drivers send their locations ... the database will be down or very slow.

How i can avoid the slow performance of Mysql database when normal users use the application (read/write records)

I was thinking about create new database just to track drivers locations and then i have a main database will be updated via cronjob for example to update users table with lat/lng every specific time

I have some limitation here ... i can not switch to no-sql database in this stage

3333 rows inserted per second. Be sure to "batch" the inserts in some way. For even higher insertion rates, see http://mysql.rjweb.org/doc.php/staging_table

DOUBLE is overkill for lat/lng, and wastes space. The size of the table could lead to performance problems (when the table gets to be "huge"). For locating a vehicle, FLOAT is probably better -- 8 bytes for 2 floats vs 16 bytes for 2 doubles. The resolution is 1.7 m (5.6 ft). Ref: http://mysql.rjweb.org/doc.php/latlng#representation_choices

On the other hand, if there is only one lat/lng per user, a million rows would be less than 100MB, not a very big table.

What queries are to be performed? A million rows against a table can be costly. "Find all users within 10 miles (or km)" would require a table scan. Recommend looking into a bounding box, plus a couple of secondary indexes.

More

The calls to update location should connect, update, disconnect. This will take a fraction of a second, and may not overload max_connections . That setting should not be too high; it could invite trouble. Also set back_log to about the same value.

Consider "connection pooling", the details of which depend on your app language, web server, version of MySQL, etc.

Together with the "bounding box" in the WHERE , have INDEX(lat), INDEX(lng) ; the Optimizer will pick between them.

Now many CPU cores in your server? Limit the number of webserver threads to about twice that. This provides another throttling mechanism to avoid "thundering herd syntrome".

Turn off the Query cache by having both query_cache_size=0 and query_cache_type=0 . Otherwise the QC costs some overhead while essentially never providing any benefit.

Batching INSERTs is feasible. But you need to batch UPDATEs . This is trickier. It should be practical by gathering updates in a table, then doing a single, multi-table, UPDATE to copy from that table into the main table. This extra table would work something like the ping-pong I discuss in my "staging_table" link. But... First let's see if the other fixes are sufficient.

Use innodb_flush_log_at_trx_commit = 2 . Otherwise, the bottleneck will be logging transactions. The downside (of losing 1 second's worth of updates) is probably not an issue for your app -- since you will get an another lat/lng soon.

Finding nearby vehicles -- This is even better than a bounding box, but it is more complex: http://mysql.rjweb.org/doc.php/latlng . How often do look for "nearbys". I hope it is not 3333/sec; that is not practical in a single server. (Multiple Slaves could provide a solution.) Anyway, the resultset does not change very fast.

There's a lot to unpick here...

Firstly, consider using the spatial data types for storing lat and long. That, in turn, will allow you to use spatial indexes, which are optimized for finding people in bounding boxes.

Secondly, if you expect such high traffic, you may need some exotic solutions.

Firstly - set up a test rig, as similar to the production hardware as possible, so you can hunt for bottlenecks. If you expect 100K inserts over a 5 minute period, you're looking at an average of 100.000 / 5 / 60 = 333 inserts per second. But scaling for average is usually a bad idea - you need to scale for peaks. My rule of thumb is that you need to be able to hand 10 times the average if the average is in the 1 - 10 minute range, so you're looking for around 3000 inserts / second.

I'd use a load testing tool ( JMeter is great) - and ensure that the bottleneck isn't in the load testing infrastructure, rather than the target server. Work out at which load your target system starts to reach the acceptable response time boundaries - for a simple insert statement, I'd set that at 1 second. If you are using modern hardware, with no triggers and a well-designed table, I'd expect to reach at least 500 inserts per second (my Macbook gets close to that).

Use this test rig to optimize your database schema and indexes - you can get a LOT of performance out of MySQL!

The next step is the painful one - there is very little you can do to increase the raw performance of MySQL inserts (lots of memory, a fast SSD drive, fast CPU; you may be able to use a staging table with no indexes to get another couple of percent improvement). If you cannot hit your target performance goal with "vanilla" MySQL, you now need to look at more exotic solutions.

The first is the easiest - make your apps less chatty. This will help the entire solution's scalability (I presume you have web/application servers between the apps and the database - they will need scaling too). For instance, rather than sending real-time updates, perhaps the apps can store 1, 5, 10, 60, 2400 minutes worth of data and send that as a batch. If you have 1 million daily active users, with peaks of 100.000 active users, it's much easier to scale to 1 million transactions per day than to 100.000 transactions every 5 minutes.

The second option is to put a message queuing server in front of your database. Message queueing systems scale much more easily than databases, but you're adding significant additional complexity to the architecture.

The third option is clustering . This allows the load to be spread over multiple physical database servers - but again introduces additional complexity and cost.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM