简体   繁体   English

了解大型mysql数据关系

[英]Understanding large mysql data relations

I am trying to teach myself how to use SQL, namely mysql. 我正在努力教自己如何使用SQL,即mysql。

What I am trying to understand is how to deal with many different types of data with in the same table. 我想要了解的是如何在同一个表中处理许多不同类型的数据。 Say I am building a web application, and I have many different content types (blog item, comment item, files, pages, forms) that I need to store different data fields for each. 假设我正在构建一个Web应用程序,并且我有许多不同的内容类型(博客项目,评论项目,文件,页面,表单),我需要为每个内容存储不同的数据字段。 Would I create a new table for each different content type since each content type has its own unique field requirements, or is there a better way to do this? 我是否会为每种不同的内容类型创建一个新表,因为每种内容类型都有自己独特的字段要求,或者有更好的方法吗? It seems a little much to create a new table for content each type. 为每种类型的内容创建一个新表似乎有点多。 If I had 30 types of content in my web app, that would be 30 tables just for the types, which seems a little much. 如果我在我的网络应用程序中有30种类型的内容,那么只有30种类型的表格,这似乎有点多。 And, if I had a new content type, I would have to create a new table that contained all the required fields I would need for that type. 而且,如果我有一个新的内容类型,我将不得不创建一个新表,其中包含该类型所需的所有必填字段。

Is there a better way to do something like this, when I have many different types of content that each requires different fields of data that needs to go into the database? 当我有许多不同类型的内容,每个内容需要不同的数据字段需要进入数据库时​​,有没有更好的方法来做这样的事情? Can I somehow check to see what type the content is, then select another table that holds all the different field types? 我可以以某种方式检查内容的类型,然后选择另一个包含所有不同字段类型的表吗?

A little confused about what to do. 对于做什么有点困惑。

Just to give an example: 举个例子:

Stack Overflow itself uses the same database table (called Posts) for questions and answers. Stack Overflow本身使用相同的数据库表(称为Posts)来提问和回答。 Even though these two types of data are not identical, the site creators considered them similar enough to put them into one table. 即使这两种类型的数据不相同,网站创建者也认为它们相似,足以将它们放入一个表中。 There's a PostTypeId field that says whether this post is a question or an answer. 有一个PostTypeId字段,说明这篇文章是一个问题还是一个答案。 On answers, the Title field would be NULL, on questions, other columns might be ignored. 在答案上,标题字段将为NULL,在问题上,其他列可能会被忽略。

Comments, on the other hand, are in a different table. 另一方面,评论位于不同的表格中。 Of course you could theoretically put them into the same Posts table and have a PostTypeId for comments. 当然,理论上你可以把它们放在同一个Posts表中并有一个PostTypeId用于注释。 But the overhead this would create (because of the lightweightness of comments) justifies creating a new table. 但是这会产生的开销(因为注释的轻量级)证明创建一个新表是合理的。

I know this isn't really an answer, and other developers might even have decided to put questions and answers into different tables; 我知道这不是一个真正的答案,其他开发人员甚至可能决定将问题和答案放入不同的表格中; but it gives some perspective. 但它提供了一些观点。 Long story short: It depends :) 长话短说:这取决于:)

Sketch interactions 素描互动

First try not to think about database design, but how entities should interact between themselves . 首先尝试不考虑数据库设计,但实体应如何在它们之间进行交互 Think of it as each entity has its own Class, which represents required data. 可以想象它,因为每个实体都有自己的Class,它代表了所需的数据。

It's always a good start to take pencil and paper and sketch your interactions between these entities, on what interactions (or relations) are you trying to accomplish. 这是一个良好的开端,用铅笔和纸张绘制这些实体之间的相互作用,以及你想要完成的交互(或关系)。 Learning the Database design process 学习数据库设计过程

Extendability and reuse 可扩展性和重用性

For example you want to have a User , which can post BlogPost s each BlogPost can have a set of Tag s and relevant set of Comment s. 例如,您希望拥有一个可以发布BlogPostUser ,每个BlogPost都可以拥有一组Tag和相关的Comment集。 Attachment s can be injected into BlogPost and also into Comment. Attachment可以注入BlogPost,也可以注入Comment。

Reusability and extendability is the key. 可重用性和可扩展性是关键。 When sketching your interactions try to isolate dependencies. 在草绘交互时尝试隔离依赖关系。 Think of it in OO manner. 以OO的方式考虑它。 Let's explore the Attachment a little more. 让我们再探讨一下Attachment You can create an Attachment table and then extend Attachement by creating BlogPostAttachment and CommentAttachment where you can easily create relations between these dependable entities. 您可以创建一个附件表,然后通过创建扩展Attachement BlogPostAttachmentCommentAttachment在这里你可以轻松地创建这些可靠的实体之间的关系。 This creates an easily extendable content type which you can further reuse in eg. 这将创建一个易于扩展的内容类型,您可以在其中进一步重用。 UserDetailsAttachment

ORM's to rescue ORM拯救

By studying example code usage of Object relational mappers like Doctrine or Propel you can grasp some ideas for table extendabity. 通过研究像DoctrinePropel这样的Object relational mappers示例代码使用,您可以掌握表扩展性的一些想法。 Practical examples are always the best one. 实际的例子总是最好的。

Related SO questions, which you may be interested in 您可能感兴趣的相关SO问题

I know, it's a long way to go, but considering factors of creating large scale DB applications with many relations and entity types it best to use help of ORM in the long run 我知道,这还有很长的路要走,但考虑到创建具有许多关系和实体类型的大规模数据库应用程序的因素,从长远来看最好使用ORM的帮助

You needn't be afraid of using many many tables - the database will happily deal with lots of them without complaining. 你不必害怕使用很多表 - 数据库很乐意处理很多表而不抱怨。 If you let each content type have its own table, you get certain advantages: 如果您让每种内容类型都有自己的表,那么您将获得以下优势:

  1. Simplicity : Each table can be fairly simple, and the constraints are straightforward. 简单 :每个表都可以非常简单,约束很简单。 For example if ContentType1 has a field with a relation to another table, you can make that a foreign key in the database design and the RDBMS will take care of data integrity for you. 例如,如果ContentType1具有与另一个表有关系的字段,则可以在数据库设计中将其设为外键,并且RDBMS将为您处理数据完整性。
  2. Indexing efficiency : if ContentType2 needs to be indexed by date but ContentType3 needs to be indexed by name (to take a simple example), having them in two separate tables means each index is there for exactly the data it needs and nothing else. 索引效率 :如果ContentType2需要按日期索引,但ContentType3需要按名称索引(举一个简单的例子),将它们放在两个单独的表中意味着每个索引都准确存在它所需的数据而不是其他任何东西。 Combining them in one table means you need both indexes covering the combined dataset, which is messier and uses up more disk space. 将它们组合在一个表中意味着您需要两个覆盖组合数据集的索引,这些索引更加混乱并占用更多磁盘空间。

If you need to output a list combining two content types, a UNION of the two tables is both easy; 如果需要输出组合两种内容类型的列表,则两个表中的UNION都很容易; and if you need to do that often with large amounts of data, an indexed view can make it cheap. 如果你需要经常使用大量数据,索引视图可以使它便宜。

On the other hand, if you have two content types which are very similar (as in the StackOverflow case above for example), you can get some advantages from combining them into one table: 另一方面,如果您有两种非常相似的内容类型(例如上面的StackOverflow情况),您可以通过将它们组合到一个表中获得一些优势:

  1. Simplicity : You only need to code the table once - if done right (ie the two content types are really very similar), this can make your codebase smaller and simpler. 简单性 :您只需要对表进行一次编码 - 如果操作正确(即两种内容类型非常相似),这可以使您的代码库更小更简单。
  2. Extensibility : if a third content type crops up which is again similar to the first two, and similar in the same way that the first two match each other, the table can straightforwardly be extended to store all three content types. 可扩展性 :如果第三种内容类型与前两种内容类型相似,并且与前两种内容类型相似,则表格可以直接扩展为存储所有三种内容类型。
  3. Indexing for performance . 索引性能 If the most common way of getting at the data is to combine the two content types and order them by date (say), a field which is common to both content types, then it can be inefficient to have two separate tables which must repeatedly be UNIONed and then sorted. 如果获取数据的最常见方式是组合两种内容类型并按日期(例如)对两种内容类型共同的字段进行排序,那么必须重复使用两个单独的表可能效率低下UNIONed然后排序。 Combining the two content types in one table lets you put a single index on the date field, allowing faster querying (though remember you can get a similar benefit from indexed views). 将两种内容类型组合在一个表中可以让您在日期字段上放置一个索引,从而允许更快的查询(但请记住,您可以从索引视图中获得类似的好处)。

If you normalize rigorously , you will have a database where every entity type has its own table in the database. 如果严格规范化 ,您将拥有一个数据库,其中每个实体类型在数据库中都有自己的表。 However, denormalization in various ways (such as combining two entity types in one table) can have benefits which might (depending on the size and shape of your data) outweight the costs. 但是,以各种方式进行非规范化(例如在一个表中组合两个实体类型)可能会带来好处,这可能会(取决于数据的大小和形状)超过成本。 I'd advise a strategy of keeping all content types separate at least at first, and consider combining them as a tactical denormalization if it turns out to be necessary. 我建议至少在开始时保持所有内容类型分离的策略,并考虑将它们组合为战术非规范化(如果事实证明是必要的话)。

You need to read a book about building websites with PHP and MySQL. 您需要阅读有关使用PHP和MySQL构建网站的书籍。 It's a good attitude to google first because some programmers think it is a lazy question. 谷歌首先是一个很好的态度,因为一些程序员认为这是一个懒惰的问题。 I suggest reading "Learning PHP MySQL and JavaScript". 我建议阅读“学习PHP MySQL和JavaScript”。 Anyway, before you start coding your site, you need to plan what kinda information you will store, then you design your database. 无论如何,在开始编写站点之前,您需要计划要存储的信息,然后设计数据库。 Say a register form will contain A First_Name, Second_Name, DateOfBirth, Country, Gender and Email. 假设注册表单将包含A First_Name,Second_Name,DateOfBirth,Country,Gender和Email。 You create a table named as say "USER_INFO" and you assign a datatype matching the data you would like to store, a Number, text, Date, and So on, then via PHP you connect to MySQL and store or retrieve the data you want. 您创建一个名为“USER_INFO”的表,并指定与您要存储的数据匹配的数据类型,数字,文本,日期等等,然后通过PHP连接到MySQL并存储或检索您想要的数据。 You really need to read a book or a tutorial so you get a full answer, AND GOOGLE :P 你真的需要阅读一本书或一本教程,以便得到一个完整的答案,并且GOOGLE:P

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM