简体   繁体   中英

MySQL or NoSQL? Recommended way of dealing with large amount of data

I have a database of which will be used by a large amount of users to store random long string (up to 100 characters). The table columns will be: userid, stringid and the actual long string.

So it will look pretty much like this:

在此输入图像描述

Userid will be unique and stringid will be unique for each user.

The app is like a simple todo-list app, so each user will have an average amount of 50 todo's. I am using the stringid in order that users will be able to delete the specific task at any given time.

I assume this todo app could end up with 7 million tasks in 3 years time and that scares me of using MySQL.

So my question is if this is the actual recommended way of dealing with large amount of data with long string (every new task gets a new row)? and is MySQL is the right database solution to choose for this kind of projects ?

I have not experienced with large amount of data yet and I am trying to save myself for the far future.

This is not a question of "large amounts" of data (mysql handles large amounts of data just fine and 2 mio rows isn't "large amounts" in any case).

MySql is a relational database. So if you have data that can be normalized, that is distributed among a number of tables that ensures every datapoint is saved only once then you should use MySql (or Maria, or any other relational database).

If you have schema-less data and speed is more important than consistency than you can/should use some NoSql database. Personally I don't see how a todo list would profit from NoSql (doesn't really matter in this case, but I guess as of now most programmig frameworks have better support for relational databases than for Nosql).

This is a pretty straightforward relational use case. I wouldn't see a need for NoSQL here.

The table you present should work fine however, I personally would question the need for the compound primary key as you would present this. I would probably have a primary key on stringid only to enforce uniqueness across all records. Rather than a compound primary key across userid and stringid. I would then put a regular index on userid.

The reason for this is in case you just want to query by stringid only (ie for deletes or updates), you are not tied into always having to query across both field to leverage your index (or adding having to add individual indexes on stringid and userid to enable querying by each field, which means my space in memory and disk taken up by indexes).

As far as whether MySQL is the right solution, this would really be for you to determine. I would say that MySQL should have no problem handling tables with 2 million rows and 2 indexes on two integer id fields. This is assuming you have allocated enough memory to hold these indexes in memory. There is certainly a ton of information available on working with MySQL, so if you are just trying to learn, it would likely be a good choice.

Regardless of what you consider a "large amount of data", modern DB engines are designed to handle a lot. The question of "Relational or NoSQL?" isn't about which option can support more data. Different relational and NoSQL solutions will handle the large amounts of data differently, some better than others.

MySQL can handle many millions of records, SQLite can not (at least not as effectively). Mongo (NoSQL) attempts to hold it's collections in memory (as well as the file system) so I have seen it fail with less than 1 million records on servers with limited memory, although it offers sharding which can help it scale more effectively.

The bottom line is: The number of records you store should not play into SQL vs NoSQL decisions, that decision should be left to how you will save and retrieve the data. It sounds like your data is already normalized (eg UserID) and if you also desire consistency when you ie delete a user (the TODO items also get deleted) then I would suggest using a SQL solution.

I assume that all queries will reference a specific userid. I also assume that the stringid is a dummy value used internally instead of the actual task-text (your random string).

Use an InnoDB table with a compound primary key on {userid, stringid} and you will have all the performance you need, due to the way a clustered index works.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM