[英]Database design for very large amount of data
I am working on a project involving large amount of data from the delicious website. 我正在开发一个涉及来自美味网站的大量数据的项目。 The data available is "Date, UserId, Url, Tags" (for each bookmark). 可用数据是“Date,UserId,Url,Tags”(对于每个书签)。
I normalized my database to a 3NF, and because of the nature of the queries that we wanted to use in combination, I came down to 6 tables... The design looks fine, however, now that a large amount of data is in the database, most of the queries need to join at least 2 tables together to get the answer, sometimes 3 or 4. At first, we didn't have any performance issues, because for testing matters we had not added too much data to the database. 我将我的数据库规范化为3NF,并且由于我们想要组合使用的查询的性质,我归结为6个表...但是,设计看起来很好,但是现在大量数据都在数据库,大多数查询需要连接至少2个表一起得到答案,有时3或4.首先,我们没有任何性能问题,因为测试问题我们没有向数据库添加太多数据。 Now that we have a lot of data, simply joining extremely large tables takes a lot of time and for our project, which has to be real-time, this is a disaster. 既然我们拥有大量数据,只需加入极大的表需要花费大量时间,对于我们的项目来说,这必须是实时的,这是一场灾难。
I was wondering how big companies solve these issues. 我想知道大公司如何解决这些问题。 Looks like normalizing tables just adds complexity, but how does the big company handle large amounts of data in their databases, don't they use normalization? 看起来规范化表只会增加复杂性,但是大公司如何处理数据库中的大量数据,他们不使用规范化吗?
Thanks. 谢谢。
Since you asked about how big companies (generally) approaches this: 既然你问过大公司(一般)如何接近这个:
They usually have a dba(database administrator) who lives and breathes the database the company uses. 他们通常有一个dba(数据库管理员),他生活和呼吸公司使用的数据库。
This means they have people that know everything from how to design the tables optimally, profile and tune the queries/indexes/OS/server to knowing what firmware revision of the RAID controller that can cause problems for the database. 这意味着他们的人员了解如何以最佳方式设计表,查询和调整查询/索引/ OS /服务器以及了解可能导致数据库出现问题的RAID控制器的固件版本。
You don't talk much about what kind of tuning you've done, eg 你不太谈论你做过什么样的调整,例如
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.