简体繁体 English

您如何组织数据库中的大数据？

[英]How do You Organize Big Data in your Database?

原文 2015-09-16 04:13:31 6 1 mysql/ database/ bigdata

I have some database with big data inside it, now I am thinking how to organize them to be more scallable. 我内部有一些包含大数据的数据库，现在我正在考虑如何组织它们以使其更具吸引力。

some point as my consideration is : 我考虑的是：

Security 安全
Performance 性能
Cost 成本

Generally answer is welcome, because I am still didn't expected all of my problem or possibility risk will happen, it's will help me if you can give me some suggestion. 一般而言，欢迎回答，因为我仍然没想到我会遇到所有问题或可能发生的风险，如果您能给我一些建议，它将对我有所帮助。

1 个解决方案

To give a full answer to your question we will need more information on how big the data is, how complex, what your use cases are (ie. do you do many joins on multiple tables or are they mostly on a single table?). 为了完全回答您的问题，我们将需要更多有关数据量，复杂程度以及用例的信息（即，您在多个表上进行了多次联接还是主要在单个表上进行联接？）。 In any case, here are some good pointers that would help you get on your way. 无论如何，这里有一些很好的指导可以帮助您上路。

If you are expecting your data to grow rapidly, I would recommend that you look at a cloud based database solution rather than invest on physical hardware that would need replacing every so often. 如果您希望数据快速增长，我建议您看一下基于云的数据库解决方案，而不要投资需要经常更换的物理硬件。 Cloud based solutions provide you more freedom to scale your database both vertically and horizontally. 基于云的解决方案为您提供了更大的自由度，可以垂直和水平扩展数据库。 There are specialized cloud database technologies such as Amazon RedShift and recently introduced Aurora which can be configured easily as your requirements grow. 有专门的云数据库技术，例如Amazon RedShift和最近推出的Aurora ，可以随着您的需求增长对其进行轻松配置。
For performance improvement within the database you can always look at indexes and changes in structures. 为了提高数据库中的性能，您始终可以查看索引和结构更改。 Use the explain syntax in MySQL to analyze your queries and see if the queries use temporary tables or data scans which will slow things down. 使用MySQL中的explain语法分析您的查询，并查看查询是否使用临时表或数据扫描，这会降低速度。 Adding indexes to columns that you use for filtering or merging data increases performance drastically. 将索引添加到用于过滤或合并数据的列中可以大大提高性能。
In data warehouses, you can also denormalize and pre-join tables to improve performance. 在数据仓库中，您还可以对表进行规范化和预联接以提高性能。 Although this will drastically increase your storage use, due to the fact that you are only working with one data table increases the performance as the time taken to do the join over and over again is taken off the equation. 尽管这将极大地增加您的存储使用量，但由于只使用一个数据表这一事实提高了性能，因为不再需要执行一次又一次的联接所花费的时间。
If you are looking at massive datasets that will grow in structure and complexity, there are other non relational database technologies such as noSQL based Hadoop , Cassandra , etc. Moving into these environments may need you to rewrite most of your application, but is something that you should consider before you find yourself in the need for such things when the data has grown too big. 如果您正在寻找数量庞大的数据集，这些数据集的结构和复杂性将不断增长，那么还有其他非关系数据库技术，例如基于noSQL的Hadoop ， Cassandra等。进入这些环境可能需要您重写大多数应用程序，但这是需要注意的当数据变得太大时，您应该先考虑是否需要此类东西。

EDIT 编辑

Privacy and data security as pointed out below by @Saïd Tahali in the comments. @SaïdTahali在评论中指出的隐私和数据安全性。 If you can't host your data outside due to legal or security reasons, you will need to invest on your own hardware that will address all of the above in-house. 如果由于法律或安全原因无法在外部托管数据，则需要投资自己的硬件来解决上述所有内部问题。