简体   繁体   English

模式设计

[英]schema design

Let's say you are a GM dba and you have to design around the GM models 假设您是GM dba,并且必须围绕GM模型进行设计

Is it better to do this? 这样做更好吗?

  • table_model table_model
    • type {cadillac, saturn, chevrolet} 类型{凯迪拉克,土星,雪佛兰}

Or this? 或这个?

  • table_cadillac_model table_cadillac_model
  • table_saturn_model table_saturn_model
  • table_chevrolet_model table_chevrolet_model

Let's say that the business lines have the same columns for a model and that there are over a million records for each subtype. 假设业务线的模型列相同,每个子类型的记录超过一百万。

EDIT: 编辑:

  • there is a lot of CRUD 有很多CRUD
  • there are a lot of very processor intensive reports 有很多非常消耗处理器的报告
  • in either schema, there is a model_detail table that contains 3-5 records for each model and the details for each model differ (you can't add a cadillac detail to a saturn model) 在这两种模式中,都有一个model_detail表,其中包含每个模型的3-5条记录,并且每个模型的详细信息有所不同(您无法向土星模型添加凯迪拉克详细信息)
  • the dev team doesn't have any issues with db complexity 开发团队在数据库复杂度方面没有任何问题
  • i'm not really sure that this is a normalization question. 我不太确定这是一个规范化问题。 even though the structures are the same they might be thought of as different entities. 即使结构相同,它们也可能被视为不同的实体。

EDIT: 编辑:

Reasons for partitioning the structure into multiple tables - business lines may have different business rules regarding parts - addModelDetail() could be different for each business line (even though the data format is the same) - high add/update activity - better performance with partitioned structure instead of single structure (I'm guessing and not sure here)? 将结构划分为多个表的原因-业务线可能具有关于零件的不同业务规则-每个业务线的addModelDetail()可能不同(即使数据格式相同)-高添加/更新活动-分区性能更好结构而不是单一结构(我猜这里不确定)?

I think this is a variation of the EAV problem. 我认为这是EAV问题的一种变体。 When posed as a EAV design, the single table structure generally gets voted as a bad idea. 如果将其摆放在EAV设计中,则通常认为单表结构是一个坏主意。 When posed in this manner, the single table strucutre generally gets voted as a good idea. 当以这种方式摆姿势时,通常将单个表结构投票给一个好主意。 Interesting... 有趣...

I think the most interesting answer is having two different structures - one for crud and one for reporting. 我认为最有趣的答案是具有两种不同的结构-一种用于原始数据,另一种用于报告。 I think I'll try concatenated/flattened view for reporting and multiple tables for crud and see how that works. 我想我将尝试使用级联/展平视图进行报告,并使用多个表进行分类,看看它是如何工作的。

Definitely the former example. 绝对是前一个例子。 Do you want to be adding tables to your database whenever you add a new model to your product range? 每当您将新模型添加到产品范围时,是否要向数据库中添加表?

On data with a lot of writes, (eg an OLTP application), it is better to have more, narrower tables (eg tables with fewer fields). 对于具有大量写入操作的数据(例如OLTP应用程序),最好具有更多,更窄的表(例如具有较少字段的表)。 There will be less lock contention because you're only writing small amounts of data into different tables. 锁争用将减少,因为您仅将少量数据写入不同的表中。

So, based on the criteria you have described, the table structure I would have is: 因此,根据您描述的条件,我将拥有的表结构为:

Vehicle
  VehicleType
  Other common fields

CadillacVehicle
  Fields specific to a Caddy

SaturnVehicle
  Fields specific to a Saturn

For reporting, I'd have an entirely different database on an entirely different server that does not have the normalized structure (eg just has CadillacVehicle and SaturnVehicle tables with all of the fields from the Vehicle table duplicated into them). 为了进行报告,我将在不具有规范化结构的完全不同的服务器上拥有一个完全不同的数据库(例如,仅将CadillacVehicle和SaturnVehicle表与“车辆”表中的所有字段都复制到其中)。

With proper indexes, even the OLTP database could be performant in your SELECT's, regardless of the fact that there are tens of millions of rows. 使用正确的索引,即使OLTP数据库在您的SELECT语句中也可能是高性能的,而不管存在数千万行的事实。 However, since you mentioned that there are processor-intensive reports, that's why I would have a completely separate reporting database. 但是,由于您提到存在大量处理器报告,所以这就是为什么我会有一个完全独立的报告数据库。

One last comment. 最后的评论。 About the business rules... the data store cares not about the business rules. 关于业务规则...数据存储不在乎业务规则。 If the business rules are different between models, that really shouldn't factor into your design decisions about the database schema (other than to help dictate which fields are nullable and their data types). 如果各模型之间的业务规则不同,则实际上不应将您纳入有关数据库模式的设计决策中(除了帮助指示哪些字段为可空值及其数据类型外)。

Use the former. 使用前者。 Setting up separate tables for the specialisations will complicate your code and doesn't bring any advantages that can't be achieved in other ways. 为专业化设置单独的表将使您的代码复杂化,并且不会带来任何其他方式无法实现的优势。 It will also massively simplify your reports. 它还将大大简化您的报告。

If the tables really do have the same columns, then the former is the best way to do it. 如果表确实具有相同的列,那么前者是最好的方法。 Even if they had different columns, you'd probably still want to have the common columns be in their own table, and store a type designator. 即使它们具有不同的列,您可能仍希望将公共列放在自己的表中,并存储一个类型指示符。

You could try having two separate databases. 您可以尝试拥有两个单独的数据库。

One is an OLTP (OnLine Transaction Processing) system which should be highly normalized so that the data model is highly correct. 一种是OLTP(在线事务处理)系统,应该对其进行高度标准化,以使数据模型高度正确。 Report performance must not be an issue, and you would deal with non-reporting query performance with indexes/denormalization etc. on a case-by-case basis. 报表性能一定不是问题,您将根据具体情况使用索引/非规范化等方式处理非报表查询性能。 The data model should try to match up very closely with the conceptual model. 数据模型应尝试与概念模型非常紧密地匹配。

The other is a Reports system which should pull data from the OLTP system periodically, and massage and rearrange that data in a way that makes report-generation easier and more performant. 另一个是Reports系统,该系统应定期从OLTP系统中提取数据,并以使生成报告更容易和更高效的方式对数据进行按摩和重新排列。 The data model should not try to match up too closely with the conceptual model. 数据模型应该试图与概念模型过于紧密匹配。 You should be able to regenerate all the data in the reporting database at any time from the data currently in the main database. 您应该能够随时从主数据库中的当前数据重新生成报告数据库中的所有数据。

I would say the first way looks better. 我想说第一种方法看起来更好。

Are there reasons you would want to do it the second way? 您是否有第二种理由想要这样做?

The first way follows normalization better and is closer to how most relational database schema are developed. 第一种方法更好地遵循规范化,并且更接近于大多数关系数据库模式的开发方式。

The second way seems to be harder to maintain. 第二种方法似乎很难维护。

Unless there is a really good reason for doing it the second way I would go with the first method. 除非有确凿的理由要采用第二种方法,否则我将采用第一种方法。

Given the description that you have given us, the answer is either. 根据您对我们的描述,答案是两种。

In other words you haven't given us enough information to give a decent answer. 换句话说,您没有给我们足够的信息来给出正确的答案。 Please describe what kind of queries you expect to perform on the data. 请说明您希望对数据执行哪种查询。

[Having said that, I think the answer is going to be the first one ;-) As I imaging even though they are different models, the data for each model is probably going to be quite similar. [话虽如此,我认为答案将是第一个;-)当我成像时,即使它们是不同的模型,每种模型的数据也可能非常相似。

But this is a complete guess at the moment.] 但这是一个完整的猜测。]

Edit: Given your updated edit, I'd say the first one definitely. 编辑:鉴于您已更新了编辑内容,我肯定会说第一个。 As they have all the same data then they should go into the same table. 由于它们具有相同的数据,因此它们应该进入同一表。

Another thing to consider in defining "better"--will end users be querying this data directly? 定义“更好”时要考虑的另一件事-最终用户会直接查询此数据吗? Highly normalized data is difficult for end-users to work with. 高度标准化的数据对于最终用户来说很难使用。 Of course this can be overcome with views but it's still something to think about as you're finalizing your design. 当然,可以通过视图解决此问题,但是在完成设计时仍然需要考虑。

I do agree with the other two folks who answered: which form is "better" is subjective and dependent on what you're hoping to achieve. 我确实同意另外两个回答的观点:哪种形式“更好”是主观的,并且取决于您希望实现的目标。 If you're hoping to achieve very quick queries that's one thing. 如果您希望实现快速查询,那是一回事。 If you're hoping to achieve high programmer productivity--that's a different goal again and possibly conflicts with quick queries. 如果您希望提高程序员的生产率,那么这又是另一个目标,并且可能与快速查询产生冲突。

Choice depends on required performance. 选择取决于所需的性能。 The best database is normalized database. 最好的数据库是规范化数据库。 But there could be performance issues in normalized database then you have to denormalize it. 但是规范化数据库中可能存在性能问题,那么您必须对其进行非规范化。 Principle "Normalize first, denormalize for performance" works well. “首先归一化,为性能而去归一化”的原理很有效。

It depends on the datamodel and the use case. 这取决于数据模型和用例。 If you ever need to report on a query that wants data out of the "models" then the former is preferable because otherwise (with the latter) you'd have to change the query (to include the new table) every time you added a new model. 如果您需要针对需要从“模型”中获取数据的查询进行报告,则前者是可取的,因为否则(对于后者)您每次添加一个时都必须更改查询(以包括新表)。新模型。

Oh and by "former" we mean this option: 哦,“前者”是指这个选项:

table_model
* type {cadillac, saturn, chevrolet}

@mson has asked the question "What do you do when a question is not satisfactorily answered on SO? ", which is a direct reference to the existing answers to this question. @mson提出了一个问题“在SO上不能令人满意地回答问题时,您会怎么做? ”,这直接引用了该问题的现有答案。

I contributed the following answer to that discussion, primarily critiquing the way the question was asked. 我为该讨论提供了以下答案,主要是提出问题的方式。


Quote (verbatim): 引用(普通):

I looked at the original question yesterday, and decided not to contribute an answer. 我昨天看了原来的问题,决定不作答。

One problem was the use of the term 'model' as in 'GM models' - which cited 'Chevrolet, Saturn, Cadillac' as 'models'. 一个问题是在“通用汽车模型”中使用了“模型”一词-将“雪佛兰,土星,凯迪拉克”称为“模型”。 To my understanding, these are not models at all; 据我了解,这些根本不是模型。 they are 'brands', though there might also be an industry-insider term for them that I'm not familiar with, such as 'division'. 它们是“品牌”,尽管可能还有一个我不熟悉的行业术语,例如“部门”。 A model would be a 'Saturn Vue' or 'Chevrolet Impala' or 'Cadillac Escalade'. 模型将是“ Saturn Vue”或“ Chevrolet Impala”或“ Cadillac Escalade”。 Indeed, there could well be models at a more detailed level than that - different variants of the Saturn Vue, for example. 确实,可能会有比这更详细的模型-例如,土星Vue的不同变体。

So, I didn't think that the starting point was well framed. 因此,我不认为起点是合理的。 I didn't critique it; 我没有批评它。 it wasn't quite compelling enough, and there were answers coming in, so I let other people try it. 这还不够引人注目,并且有一些答案,所以我让其他人尝试一下。

The next problem is that it is not clear what your DBMS is going to be storing as data. 下一个问题是,不清楚您的DBMS将存储为数据的内容。 If you're storing a million records per 'model' ('brand'), then what sorts of data are you dealing with? 如果每个“模型”(“品牌”)要存储一百万条记录,那么您要处理哪种数据? Lurking in the background is a different scenario - the real scenario - and your question has used an analogy that failed to be sufficiently realistic. 在后台潜伏是一个不同的场景-真实场景-您的问题使用的类比未能足够真实。 That means that the 'it depends' parts of the answer are far more voluminous than the 'this is how to do it' ones. 这意味着答案的“取决于”部分比“这是如何做到”的部分要庞大得多。 There is just woefully too little background information on the data to be modelled to allow us to guess what might be best. 可悲的是,关于数据的背景信息太少而无法建模,无法让我们猜测什么是最好的。

Ultimately, it will depend on what uses people have for the data. 最终,这将取决于人们对数据的使用方式。 If the information is going to go flying off in all different directions (different data structures in different brands; different data structures at the car model levels; different structures for the different dealerships - the Chevrolet dealers are handled differently from the Saturn dealers and the Cadillac dealers), then the integrated structure provides limited benefit. 如果信息将朝着各个不同的方向飞散(不同品牌的数据结构不同;汽车模型级别的数据结构不同;针对不同经销商的结构不同-雪佛兰经销商与土星经销商和凯迪拉克的处理方式不同)经销商),则集成结构提供的收益有限。 If everything is the same all the way down, then the integrated structure provides a lot of benefit. 如果一切都一样,那么集成结构将带来很多好处。

Are there legal reasons (or benefits) to segregating the data? 分离数据是否有法律理由(或利益)? To what extent are the different brands separate legal entities where shared records could be a liability? 不同品牌在多大程度上分开了法人实体,共享记录可能会构成负债? Are there privacy issues, such that it will be easier to control access to the data if the data for the separate brands is stored separately? 是否存在隐私问题,如果单独存储不同品牌的数据,将更容易控制对数据的访问?

Without a lot more detail about the scenario being modelled, no-one can give a reliable general answer - at least, not more than the top-voted one already gives (or doesn't give). 如果没有对场景进行建模的更多细节,没有人可以给出可靠的一般答案-至少不超过已经给出(或没有给出)最高投票的答案。

  • Data modelling is not easy. 数据建模并不容易。
  • Data modelling without sufficient information is impossible to do reliably. 没有足够信息的数据建模不可能可靠地完成。

I have copied the material here since it is more directly relevant. 我在这里复制了该材料,因为它更直接相关。 I do think that to answer this question satisfactorily, a lot more context should be given. 我确实认为,要令人满意地回答这个问题,应该提供更多的背景。 And it is possible that there needs to be enough extra context to make SO the wrong place to ask it. 并且可能需要有足够的额外上下文来使SO成为错误的查询位置。 SO has its limitations, and one of those is that it cannot deal with questions which require long explanations. SO有其局限性,其中之一是它不能处理需要长时间解释的问题。

From the SO FAQs page: 从SO常见问题解答页面中:

What kind of questions can I ask here? 我在这里可以问什么问题?

Programming questions, of course! 编程问题,当然! As long as your question is: 只要您的问题是:

  • detailed and specific 详细而具体
  • written clearly and simply 清楚清楚地写
  • of interest to at least one other programmer somewhere 至少在某个地方有其他程序员感兴趣

... ...

What kind of questions should I not ask here? 我在这里不应该问什么问题?

Avoid asking questions that are subjective, argumentative, or require extended discussion. 避免提出主观,争论或需要扩展讨论的问题。 This is a place for questions that can be answered! 这是一个可以回答问题的地方!

This question is, IMO, close to the ' require extended discussion ' limit. 国际海事组织,这个问题接近“ 需要进一步讨论 ”的限制。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM