简体繁体 English

如何使用MySQL存储此分层数据？

[英]How do I store this hierarchical data using MySQL?

原文 2013-08-14 05:14:13 2 2 mysql/ tree

I am currently designing a web application which will be used by many businesses. 我目前正在设计一个将被许多企业使用的Web应用程序。 However, I am having trouble deciding how to store the data. 但是，我在决定如何存储数据时遇到了麻烦。 The general structure of the data is demonstrated in this tree : http://i.imgur.com/lpYwqya.png 该树显示了数据的一般结构： http : //i.imgur.com/lpYwqya.png

So there will be a table that lists every client. 因此，将有一个表列出每个客户。 Each client has its own users and projects. 每个客户都有自己的用户和项目。 Each project has two children: users and tasks. 每个项目都有两个子级：用户和任务。 Users refers to the users registered under the client who are allowed to access that project (will store the id of that user, and their permission [read/write]) For each level of the tree, I need to store data. 用户是指在客户端下注册的，有权访问该项目的用户（将存储该用户的ID，以及其权限[读/写]）。对于树的每个级别，我都需要存储数据。 For instance, a task has the following fields (WBS, Name, Start Date, Finish Date, Duration, Work, Cost, Fixed Cost, Vendor, ...) 例如，任务具有以下字段（WBS，名称，开始日期，完成日期，持续时间，工作，成本，固定成本，供应商...）

I am having difficulty deciding how to best structure the data. 我很难决定如何最好地构造数据。 Note that the data will always be accessed from the top of the tree down (parents to children), and I never have to move across children or back up the tree. 请注意，将始终从树的顶部向下访问数据（从父级到子级），并且我永远不必在子级之间移动或备份树。 Here are two solutions I have come up with: 这是我提出的两个解决方案：

Solution 1 : Have an unlimited number of tables. 解决方案1 ：拥有无限数量的表。 Every time a client is created, two tables are also created: 1_projects and 1_users (where 1 is the id of the client in the first table). 每次创建客户端时，也会创建两个表：1_projects和1_users（其中1是第一个表中客户端的ID）。 When a project is created, a table 1_1_tasks will be created, and so on. 创建项目后，将创建表1_1_tasks，依此类推。 So the plan table for a risk with id 5, task id 3895, project id 19, and client id 57658 would be: 57658_19_3895_5_plans. 因此，标识为5，任务ID 3895，项目ID 19和客户ID 57658的风险计划表将为：57658_19_3895_5_plans。

Solution 2 : Have 9 tables: clients, users, projects, project_users, tasks, risks, risk_updates, plans, plan_updates. 解决方案2 ：拥有9个表：客户，用户，项目，project_users，任务，风险，risk_updates，计划，plan_updates。 In the risks table, in addition to the fields that every risk has associated with it, it will also have the following: client_id, project_id, task_id. 在风险表中，除了每个风险都与之相关联的字段外，它还将具有以下内容：client_id，project_id，task_id。 So, for example, if I want to return every risk that a client has for a particular task, I search the entire tree for risks where client_id = #, project_id = #, task_id = #. 因此，例如，如果我想返回客户针对某个特定任务的所有风险，我会在整个树中搜索其中client_id =＃，project_id =＃，task_id =＃的风险。 Of course, these fields would form a composite/compound key for the risk table. 当然，这些字段将构成风险表的复合/复合键。 So, the risk table would store the risks for every task, from every project, from every client. 因此，风险表将存储每个任务，每个项目，每个客户的风险。 The last table, plan_updates, would obviously be massive. 最后一个表plan_updates显然很大。

I believe solution 1 to be strong because it allows me easily navigate down the tree because nodes that do not belong to the same parent are not stored in the same table. 我相信解决方案1是强大的，因为它使我可以轻松地在树上导航，因为不属于同一父级的节点不会存储在同一表中。 However, this solution is also very bad because there will be a massive number of tables, and so any later modifications to the database would be very difficult. 但是，此解决方案也很糟糕，因为将有大量的表，因此以后对数据库进行任何修改都是非常困难的。

Solution 2 is strong because all risks are centralized in one table. 解决方案2很强大，因为所有风险都集中在一个表中。 However, I wonder whether it will be very inefficient when searching say, the plan_updates table because I will have to search the entire table (which will be massive) for fields that match the id's of all parent elements. 但是，我想知道当搜索说plan_updates表时是否效率很低，因为我将不得不在整个表（将是巨大的）中搜索与所有父元素的ID匹配的字段。

To put this all into perspective, I anticipate the following: 为了使这一切正确，我预期以下内容：

Users: 1-20 per client. 用户：每个客户1-20。 Usually less than 5. 通常小于5。

Projects: 1-100 per client. 项目：每个客户1-100。 Most will be less than 20. 大多数将小于20。

Tasks: 100-10,000 per project. 任务：每个项目100-10,000。

Risks: 0-10 per task. 风险：每个任务0-10。 Only around 30% of tasks will have risks though, and the majority of these will only have 1-4 risks. 不过，只有大约30％的任务会有风险，而其中大多数将只有1-4个风险。

Risk Updates: 1-10 per risk. 风险更新：每个风险1-10。

Plans: 1-5 per risk. 计划：每个风险1-5。

Plan Updates: 1-10 per plan. 计划更新：每个计划1-10。

If anyone could shed some light on how I could best solve this problem, that would be very helpful. 如果有人可以阐明我如何才能最好地解决这个问题，那将非常有帮助。

2 个解决方案

The second solution seems much more reasonable to me. 第二种解决方案对我来说似乎更合理。 The biggest flaw in the first solution would be the poor manageability of the whole structure. 第一个解决方案的最大缺陷是整个结构的可管理性差。 You will very soon end up with a massive number of tables and in case of a structure change (an extra field or an extra constraint needs to be added) you will have trouble. 您很快就会得到大量的表，并且如果结构发生更改（需要添加额外的字段或额外的约束），则会遇到麻烦。

Your concerns for compound keys are not that serious on the other hand. 另一方面，您对复合键的担心并不那么严重。

Tasks for example can be assigned to individual projects alone. 例如，任务可以单独分配给各个项目。 There is no need for them to have a reference directly to client too. 他们也无需直接参考客户。 On the other hand it is very likely that you will at some point introduce another nn link table connecting the users and tasks directly in order to define who is to carry out that particular task. 另一方面，很可能您会在某个时候引入另一个nn链接表，该表直接将用户和任务连接起来，以便定义谁来执行该特定任务。

So, if you want to list all the risks of a task you will first have to find the task at hand and then use a single key (the task id) to scan the risks table. 因此，如果要列出任务的所有风险，则首先必须找到手头的任务，然后使用单个键（任务ID）扫描风险表。 That remains the same whether you have one or multiple tables. 无论您有一个表还是多个表，都保持不变。

I strongly suggest you choose soution #2 and make sure you identify all the relevant primary keys and indexes (and unique columns where applicable). 我强烈建议您选择第2部分，并确保您标识了所有相关的主键和索引（以及适用的唯一列）。 That will make the database fast and efficient. 这将使数据库快速高效。

Edit 编辑

As @MSW mentions there is a whole lot more to be said about the subject. 正如@MSW所提到的，关于该主题还有很多要说的。 There is endless literature about database design (with principles like normality, atomicity ...) that covers the subject. 关于该主题的数据库设计（具有常态性，原子性等原理）的文献不计其数。

One further point that explains the poor quality of solution #1 would also be that at a later point you will not easily be able to do analyses across various projects since they will all be in a large number of different tables. 进一步说明问题＃1的质量较差的原因还在于，稍后，您将无法轻松地对各个项目进行分析，因为它们都位于大量不同的表中。

Stay away from your solution #1. 远离解决方案＃1。 Better stick to your solution #2 but with a few changes. 最好坚持解决方案2，但要进行一些更改。

Your Risks table does not need these keys: client_id, project_id, task_id. 您的“风险”表不需要这些键：client_id，project_id，task_id。 You would only need task_id (as the foreign key) since your Tasks table is already associated with your Projects. 您仅需要task_id（作为外键），因为您的Tasks表已经与Projects相关联。 Same as with the Plans, Risk Updates and so on. 与计划，风险更新等相同。 Like you have mentioned, you always access it from the top-down (join the tables from projects to tasks to risks and so on). 就像您提到的那样，您总是从上至下访问它（将项目，任务，风险等表格连接起来）。