简体   繁体   English

数据库设计用于非常大量的数据

[英]Database design for very large amount of data

I am working on a project involving large amount of data from the delicious website. 我正在开发一个涉及来自美味网站的大量数据的项目。 The data available is "Date, UserId, Url, Tags" (for each bookmark). 可用数据是“Date,UserId,Url,Tags”(对于每个书签)。

I normalized my database to a 3NF, and because of the nature of the queries that we wanted to use in combination, I came down to 6 tables... The design looks fine, however, now that a large amount of data is in the database, most of the queries need to join at least 2 tables together to get the answer, sometimes 3 or 4. At first, we didn't have any performance issues, because for testing matters we had not added too much data to the database. 我将我的数据库规范化为3NF,并且由于我们想要组合使用的查询的性质,我归结为6个表...但是,设计看起来很好,但是现在大量数据都在数据库,大多数查询需要连接至少2个表一起得到答案,有时3或4.首先,我们没有任何性能问题,因为测试问题我们没有向数据库添加太多数据。 Now that we have a lot of data, simply joining extremely large tables takes a lot of time and for our project, which has to be real-time, this is a disaster. 既然我们拥有大量数据,只需加入极大的表需要花费大量时间,对于我们的项目来说,这必须是实时的,这是一场灾难。

I was wondering how big companies solve these issues. 我想知道大公司如何解决这些问题。 Looks like normalizing tables just adds complexity, but how does the big company handle large amounts of data in their databases, don't they use normalization? 看起来规范化表只会增加复杂性,但是大公司如何处理数据库中的大量数据,他们不使用规范化吗?

Thanks. 谢谢。

Since you asked about how big companies (generally) approaches this: 既然你问过大公司(一般)如何接近这个:

They usually have a dba(database administrator) who lives and breathes the database the company uses. 他们通常有一个dba(数据库管理员),他生活和呼吸公司使用的数据库。

This means they have people that know everything from how to design the tables optimally, profile and tune the queries/indexes/OS/server to knowing what firmware revision of the RAID controller that can cause problems for the database. 这意味着他们的人员了解如何以最佳方式设计表,查询和调整查询/索引/ OS /服务器以及了解可能导致数据库出现问题的RAID控制器的固件版本。

You don't talk much about what kind of tuning you've done, eg 你不太谈论你做过什么样的调整,例如

  • Are you using MyISAM or InnoDB tables ? 您使用的是MyISAM还是InnoDB表? Their performance(and not the least their features) is radically different for different workloads. 对于不同的工作负载,它们的性能(尤其是它们的功能)完全不同。
  • Are the tables properly indexed according to the queries you run ? 表是否根据您运行的查询正确编制索引?
  • run EXPLAIN on all your queries - which will help you identify keys that could be added/removed, wether the proper keys are selected, compare queries(SQL leaves you with lots of way to accomplish the same things) 在你的所有查询上运行EXPLAIN - 这将帮助你识别可以添加/删除的键,选择正确的键,比较查询(SQL让你有很多方法来完成相同的事情)
  • Have you tuned the query-cache ? 你有没有调整查询缓存? For some workloads the query cache(default on) can cause considerable slowdown. 对于某些工作负载,查询缓存(默认开启)会导致相当大的减速。
  • How much memory do your box have , and is mysql tuned to take advantage of this ? 你的盒子有多少内存,是mysql调整好利用这个?
  • Do you use a file system and raid setup geared towards the database ? 您是否使用面向数据库的文件系统和raid设置?
  • Sometimes a little de-normalization is needed. 有时需要一点去标准化。
  • Different database products will have different charasteristics, MySQL might be blazingly fast for some worlkoads, and slow for others. 不同的数据库产品将具有不同的特性,对于某些工作来说,MySQL可能会非常快,而对于其他数据库而言,它可能会很慢。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM